IoT monitoring station with environmental sensors and cameras in mountainous terrain

DigitalKMPS

Introducing EWS v1.0 - Our distributed, AI-powered early warning system for critical infrastructure monitoring

Data center infrastructure supporting distributed computing with Pekko and Apache Cassandra

EWS - Early Warning System v1.0

High-performance distributed platform processing 100,000 requests/minute with sub-millisecond response

High-availability cloud infrastructure with redundant systems and global network connectivity

Cloud-Native Architecture

Deploy on Docker, Kubernetes, or ArgoCD GitOps with auto-scaling and self-healing

We build concurrent, scalable distributed systems — and run them in the real world.

We're a two-person engineering company with a simple focus: systems that stay correct and stay up under load. Between us we've spent more than 35 years building software — distributed backends, edge devices, cloud infrastructure, and the messy integration work in between.

Most platforms break in the same places: they can't scale past a few thousand concurrent clients, they lose data when a node dies, and alerts arrive too late to act on. We build for the opposite — horizontal scale, zero-data-loss persistence, and sub-minute response, designed in from the start rather than bolted on later.

EWS is where all of that comes together.

EWS — Early Warning System

AI-powered distributed monitoring for real-time wildfire and flood detection.

EWS pairs solar-powered edge stations with a horizontally scalable cloud backend. Each station carries triple-spectrum imaging (visible, near-infrared night vision, and FLIR thermal), an on-device AI inference chip, an 80 GHz radar water-level sensor, and a full environmental sensor suite — running autonomously in the field.

The cloud platform ingests minute-resolution telemetry, runs imagery through detection models, and pushes alerts in under 60 seconds, with the architecture to handle thousands of concurrent stations and 100,000+ requests per minute.

<60s

Alert latency

1,000+

Concurrent stations

100K

Requests / minute

99.999%

Data durability

99.9%

Availability

EWS is being deployed as a pilot in Bulgaria, targeting the Plovdiv, Sofia, and Burgas regions, and is built around EU climate-adaptation and civil-protection priorities.

130,000 requests per minute — on two modest on-prem servers, no cloud.

The EWS platform sustained over 130,000 read requests every minute (peaking near 167,000) in a deliberately cache-defeating, worst-case benchmark — so real-world throughput runs higher.

It ran on a compact two-node Kubernetes cluster: a Dell PowerEdge T420 (16 cores / 32 threads, 80 GB RAM) and a second worker node (12 vCPU, 55 GB RAM) working in tandem — modest, energy-efficient on-prem hardware, not the cloud. Throughout the test the system held a sub-100 ms median response time, zero backend errors, and 100% end-to-end data correctness — proof that a well-architected, event-sourced design delivers serious scale on affordable equipment, comfortably past the 100,000 req/min design target.

130K+

req/min sustained · ~167K peak

errors · 100% verified (40/40)

<100ms

median latency, full load

cache — worst case; prod higher

Node 1 — Dell PowerEdge T420

2× Xeon E5-2450L · 16 cores / 32 threads · 80 GB RAM

Node 2 — worker node

12 vCPU · 55 GB RAM

Combined: 28 cores · 135 GB · no cloud, ~2012-era equipment.

EWS v1.0 Release

Four core modules powering the next generation of environmental monitoring

EWS Cluster

Distributed weather station management using Apache Pekko Cluster Sharding with event sourcing.

Handles 1000+ weather stations with automatic failover, data recovery, and 3-node Cassandra cluster for complete audit trails.

EWS HTTP API

Stateless REST API delivering sub-millisecond response times at 100,000 requests/minute.

Multi-tier caching with Redis cluster, ClickHouse analytics, circuit breaker protection, and 10-20 auto-scaling instances.

EWS Image Service

High-throughput image upload service processing 12,000 images/hour from weather stations.

S3-compatible storage with MinIO 4-node erasure coded cluster, automatic 7-day retention policy.

EWS Alert Agent

AI-powered notification system for fire and flood alerts using 100+ language models.

Generates human-readable messages via OpenRouter (GPT, Claude, Llama), with email delivery and distributed deduplication.

How it works

Edge

Self-contained stations combine triple-spectrum cameras with on-device AI (YOLO inference on an NXP i.MX 8M Plus NPU), so a possible fire or flood is flagged locally before any data leaves the site. Solar power and dual batteries give 11+ days of autonomy.

Cloud

A distributed actor backend (Apache Pekko cluster sharding + event sourcing) processes every station as an independent, recoverable entity. Cassandra holds the event journal, ClickHouse powers real-time analytics, Redis caches hot data — all on Kubernetes.

Alert

Detections become natural-language alerts for first responders, delivered by email and push, with operator-initiated live video on demand for visual confirmation.

EWS Cluster Distributed Processing

Distributed Weather Station Management:

Cluster Sharding: Apache Pekko distributes processing across multiple nodes, handling millions of sensor readings daily with automatic load balancing and independent scaling for optimal performance.

Event Sourcing: Complete audit trail with Apache Cassandra (RF=3) ensures data reliability, fault tolerance, and historical analysis capabilities. Actor state is reliably stored and recovered, even after system crashes.

High Availability: 3-node Cassandra cluster with automatic failover ensures zero data loss and continuous operation across all tiers.

Scalability: Handles 1000+ weather stations with sharded entities, distributing processing across cluster nodes for optimal resource utilization.

CQRS Pattern: Separate read/write paths optimize performance, with Pekko Projections for event streaming and pre-aggregated materialized views.

EWS HTTP API Real-Time Analytics

High-Performance REST API with Real-Time Analytics:

Performance: Stateless REST API delivering 100,000 requests/minute throughput with sub-millisecond response times for real-time analytics queries.

Multi-Tier Caching: L0 (in-memory local) + L1 (12-node Redis cluster) cache layers provide lightning-fast data access and reduced database load.

Analytics Engine: ClickHouse 3-node replicated cluster powers real-time analytics with pre-aggregated materialized views for hourly, daily, and station-level statistics.

Auto-Scaling: 10-20 API instances scale dynamically based on load, with circuit breaker protection and rate limiting for graceful degradation under traffic spikes.

Fire/Flood Detection: Real-time alert detection with event streaming via Pekko Projections, enabling immediate response to environmental hazards.

EWS Image Service Object Storage

High-Throughput Image Processing and Storage:

Throughput: Processing 12,000 images/hour from weather stations with high-performance upload service designed for continuous IoT data ingestion.

S3-Compatible Storage: MinIO 4-node erasure coded cluster provides S3-compatible object storage with enterprise-grade reliability and data durability.

Automatic Retention: 7-day retention policy automatically manages storage lifecycle, ensuring optimal resource utilization while maintaining recent data availability.

Fault Tolerance: Distributed storage architecture with erasure coding survives multiple node failures without data loss, ensuring continuous operation.

IoT Integration: Seamless upload integration from weather station cameras, supporting fire and flood detection through visual monitoring and computer vision analysis.

EWS Alert Agent AI Integration

AI-Powered Alert Generation and Notification:

AI Models: OpenRouter integration provides access to 100+ language models including GPT, Claude, and Llama for intelligent alert message generation.

Human-Readable Alerts: AI generates clear, actionable fire and flood warning messages tailored for both emergency responders and public notification.

Email Delivery: SendGrid integration ensures reliable delivery of alert notifications to stakeholders and emergency management personnel.

Distributed Deduplication: Cluster-wide deduplication prevents alert fatigue by ensuring each unique event generates only one notification.

Monitoring: Prometheus metrics collection and Grafana dashboards provide comprehensive visibility into alert generation, delivery rates, and system health.

Engineering

What we're good at, and what every line of EWS reflects.

Concurrency by design

Actor-model architecture where each entity owns its own state, so adding stations adds capacity rather than contention.

Horizontal scalability

Cluster sharding and stateless services that scale out across nodes without redesign.

Fault tolerance

Survives multiple node failures with zero data loss, through replication, circuit breakers, and graceful degradation.

Real-time at scale

Sub-minute detection and high-throughput analytics under continuous load.

Edge-to-cloud

From embedded Linux and sensor integration up to Kubernetes, CI/CD, and SIEM-grade security.

Built on a mature open-source foundation (Apache Pekko, Cassandra, ClickHouse, Redis, Kubernetes), with security hardened in depth — OAuth2/OIDC, JWT, role-based access, HA firewall, and SIEM.

Contact

Interested in a pilot, a partnership, or the technology behind EWS? Get in touch.

startup@digitalkmps.eu · Plovdiv, Bulgaria

Talk to us