In This Article
The True Cost of Real-Time
| Cost Category | Batch | Streaming | Multiplier |
|---|---|---|---|
| Infrastructure | Scheduled compute (pay when running) | Always-on compute (24/7) | 3-5x |
| Development | SQL/Python transforms, well-understood patterns | Stateful processing, exactly-once, schema evolution | 2-3x |
| Debugging | Rerun the job, inspect the data | Distributed logs, state inspection, replay events | 3-5x |
| Operations | Monitor job success/failure | Monitor lag, throughput, error rate, connector health | 2-3x |
| Talent | Data engineers (widely available) | Streaming engineers (scarce, premium) | 1.5-2x salary |
Total cost multiplier: streaming costs 3-5x more than batch for equivalent data volume. This cost is justified when the business value of fresher data exceeds the additional cost — and only when the business actually acts on the fresher data. If the fraud team reviews alerts every 4 hours regardless of when they arrive, real-time detection with 4-hour response has the same business impact as hourly batch detection with 4-hour response.
Decision Framework: 5 Questions
Question 1: What decision does this data inform? (Fraud detection → real-time. Monthly financial close → batch.) Question 2: What's the cost of delayed detection? (Fraud: $50K per undetected transaction → real-time justified. Sales report: no cost until end of day → batch sufficient.) Question 3: How fast does the organization act on the data? (If the alert fires at 2 AM and nobody looks until 9 AM → batch at 6 AM achieves the same outcome at lower cost.) Question 4: Does the team have streaming expertise? (If not, the 2-3x development cost becomes 5-8x. Train first or hire streaming engineers.) Question 5: Can the organization operate a streaming system? (24/7 monitoring, on-call rotation, incident response capability required.)
When Batch Wins
Batch is the right choice for: analytics and reporting (daily sales dashboards, weekly business reviews, monthly financial close — the audience consumes these on a schedule, not continuously), ML model training (models retrain daily, weekly, or monthly — training on streaming data adds complexity without improving model quality), data warehousing (the data warehouse is designed for analytical queries on historical data — batch loading at a defined schedule is the natural pattern), regulatory reporting (most regulatory reports are periodic — batch extraction aligned to the reporting schedule is simpler and more auditable), and small-to-medium data volumes (if the nightly batch job runs in 20 minutes and the data is available by 6 AM — the business value of reducing this to 5-minute freshness is near zero for most use cases). The batch architecture: scheduled pipelines → transform in Fabric/Spark → load to warehouse → refresh Power BI. Simple, reliable, well-understood.
When Streaming Wins
Streaming is the right choice for: fraud and anomaly detection (every minute of delay = more fraudulent transactions completed), IoT and operational monitoring (sensor data from manufacturing equipment needs real-time detection, not next-day batch), customer experience (real-time recommendations, live inventory updates, personalized pricing — freshness directly impacts conversion), event-driven architectures (microservices communicating via events need real-time processing), and high-volume CDC (tables with 1B+ rows where nightly full extraction is prohibitively expensive — CDC streaming captures only changes, reducing data transfer 99%+).
Hybrid Architecture: Lambda and Kappa
Lambda architecture: batch layer (processes all historical data nightly, produces accurate complete results) + speed layer (processes real-time stream, produces approximate recent results) + serving layer (merges batch and speed results). Advantage: accuracy from batch + freshness from streaming. Disadvantage: two separate codepaths to maintain. Kappa architecture: single streaming pipeline processes all data (both historical replay and real-time). The stream is the source of truth — batch results are derived by replaying the stream. Advantage: single codebase. Disadvantage: reprocessing historical data through streaming is slower than batch. Practical recommendation: Most organizations implement a pragmatic hybrid — batch for analytics and reporting (90% of use cases), streaming for the 2-3 use cases that genuinely require real-time (fraud, IoT, customer experience). This avoids Lambda/Kappa complexity while delivering real-time where it matters.
Migrating from Batch to Streaming
Identify the 2-3 use cases that justify real-time
Use the 5-question framework above. Don't migrate everything — migrate only where fresher data changes business outcomes.
Run streaming alongside batch
Deploy streaming for selected use cases while keeping batch running. Compare: are results consistent? Is the business actually consuming the real-time data?
Decommission batch for validated use cases
After 4-8 weeks of parallel operation with validated accuracy and confirmed business adoption: decommission batch for streaming use cases. Keep batch for everything else.
Pre-Implementation Checklist
Before building a streaming pipeline, confirm: the business decision changes with fresher data (documented, not assumed). The organization can act on real-time alerts (24/7 response capability exists or will be built). The team has streaming expertise (or the timeline includes a training/hiring phase). The on-call rotation includes streaming monitoring. The budget includes: always-on infrastructure, streaming engineers, and operational tooling. And most importantly: the batch pipeline has been evaluated — if batch delivers the same business outcome at 3-5x lower cost, batch is the correct engineering decision.
Team and Skills Requirements
| Capability | Batch Team | Streaming Team (Additional) |
|---|---|---|
| Core skills | SQL, Python/Spark, ETL patterns | +Kafka/Event Hubs, Flink/Spark Streaming, event-driven design |
| Operations | Job scheduling, failure handling | +Consumer lag monitoring, partition management, on-call rotation |
| Architecture | Batch pipeline design, data modeling | +Event schema design, exactly-once patterns, back-pressure handling |
| Debugging | Rerun job, inspect intermediate data | +Distributed tracing, event replay, state inspection |
The streaming team needs everything the batch team knows PLUS streaming-specific skills. This is why the 2-3x development cost exists — the skill set is a superset, not a different set. Organizations that try to "learn streaming by building production streaming" pay the education cost in production incidents. Train on non-critical use cases first, or bring in experienced streaming engineers for the initial build.
Real-Time vs Near-Real-Time vs Batch: The Latency Spectrum
| Latency | Pattern | Infrastructure | Cost | Use Cases |
|---|---|---|---|---|
| Sub-second | True streaming | Always-on Flink/Kafka Streams | $$$$ | Fraud, safety, trading |
| 1-15 seconds | Micro-batch | Spark Structured Streaming | $$$ | IoT, inventory, pricing |
| 1-15 minutes | Frequent batch | Scheduled Spark jobs | $$ | Dashboards, operational reports |
| 1-24 hours | Standard batch | Nightly/hourly ETL | $ | Analytics, warehousing, ML training |
The spectrum shows: each step from batch toward real-time doubles the infrastructure cost and operational complexity. The decision isn't binary (batch OR streaming) — it's: what latency does this specific use case require, and what's the minimum infrastructure that achieves it? A dashboard that refreshes every 15 minutes doesn't need sub-second streaming infrastructure — a scheduled Spark job running every 15 minutes achieves the same user experience at 1/10th the cost. Match the latency to the use case. Don't over-engineer the freshness.
Total Cost of Ownership: Streaming vs Batch Example
A concrete cost comparison for a data pipeline processing 10M events/day: batch (nightly) — infrastructure: Spark cluster running 2 hours/night ($300/month). Development: 2 weeks (well-understood patterns). Operations: 2 hours/month (monitor job success, handle occasional failures). Annual cost: $3,600 infra + $5K ops = $8,600/year. Streaming — infrastructure: Kafka cluster + Spark Streaming 24/7 ($3,000/month). Development: 6 weeks (stateful processing, exactly-once, monitoring). Operations: 8 hours/month (monitor lag, handle connector issues, on-call rotation). Annual cost: $36,000 infra + $20K ops = $56,000/year. Cost multiplier: 6.5x. The streaming pipeline delivers 5-minute freshness instead of 24-hour freshness. Is that improvement worth $47,400/year? For fraud detection: absolutely (preventing a single $50K fraud event justifies the annual cost). For a sales dashboard: probably not (the sales team reviews it once per morning regardless of refresh frequency).
Case Studies: Streaming Justified vs Not Justified
Justified — Fraud Detection: A financial services company processed credit card transactions in nightly batch. Fraud was detected next-day — by which time fraudulent transactions worth $200K-500K/month had completed. Real-time streaming investment: $300K (Kafka + Flink + operations). Result: fraud detected in sub-second, blocked before completion. Fraud losses reduced 85% = $170K-425K/month saved. ROI: payback in 2 months. Not Justified — Sales Dashboard: A retail company wanted "real-time sales dashboard" because a competitor had one. Investigation: the sales team reviews the dashboard once per day at 8 AM. Current nightly batch refresh delivers data by 6 AM. Streaming investment: $200K + $56K/year operations. Business value of real-time vs 6 AM: zero — the team doesn't look at the dashboard between 8 AM reviews. Decision: keep batch, invest the $200K in better analytics features. Partially Justified — Inventory: An e-commerce company had overselling issues (selling products that were already out of stock) due to 1-hour inventory updates. Full real-time streaming was considered ($150K) but a 5-minute micro-batch was implemented instead ($30K) — reducing overselling 95% at 1/5th the cost of full real-time. The 5-minute delay didn't impact the business outcome — the customer checkout process takes 3 minutes anyway.
Building the Business Case for Streaming
The streaming business case template: current state cost (what does the current batch latency cost the business? quantify: fraud losses, missed SLA penalties, overselling costs, customer churn from stale recommendations, or regulatory fines for delayed reporting), streaming investment (infrastructure cost + development cost + ongoing operations cost — use the multiplier framework: 3-5x batch cost), streaming benefit (current state cost × improvement percentage from reduced latency — be conservative: 50-70% improvement, not 100%), and ROI calculation (benefit - cost over 3 years, including ongoing operations. Target: ROI > 3x for the streaming use case to justify the complexity). If the ROI doesn't reach 3x — the business value doesn't justify the engineering investment. Improve the batch pipeline instead (optimize refresh frequency, improve data quality, add better alerting) for 80% of the benefit at 20% of the cost.
The Xylity Approach
We help organizations make the streaming vs batch decision with the 5-question framework — evaluating business value, detection cost, response time, team capability, and operational readiness. Our streaming engineers and data engineers build real-time where justified and optimize batch where batch is the right answer — because the best architecture matches the business need at the lowest sustainable cost.
Go Deeper
Continue building your understanding with these related resources from our consulting practice.
Stream Where It Matters — Batch Where It Doesn't
5-question decision framework, hybrid architecture, migration roadmap. The honest assessment of when real-time justifies the 3-5x cost premium.
Start Your Architecture Assessment →