The True Cost of Real-Time

Cost CategoryBatchStreamingMultiplier
InfrastructureScheduled compute (pay when running)Always-on compute (24/7)3-5x
DevelopmentSQL/Python transforms, well-understood patternsStateful processing, exactly-once, schema evolution2-3x
DebuggingRerun the job, inspect the dataDistributed logs, state inspection, replay events3-5x
OperationsMonitor job success/failureMonitor lag, throughput, error rate, connector health2-3x
TalentData engineers (widely available)Streaming engineers (scarce, premium)1.5-2x salary

Total cost multiplier: streaming costs 3-5x more than batch for equivalent data volume. This cost is justified when the business value of fresher data exceeds the additional cost — and only when the business actually acts on the fresher data. If the fraud team reviews alerts every 4 hours regardless of when they arrive, real-time detection with 4-hour response has the same business impact as hourly batch detection with 4-hour response.

The question isn't "can we build it real-time?" — the answer is always yes. The question is "does the business decision change if the data is 5 minutes old vs 24 hours old?" If both produce the same decision, batch is 3-5x cheaper.

Decision Framework: 5 Questions

Question 1: What decision does this data inform? (Fraud detection → real-time. Monthly financial close → batch.) Question 2: What's the cost of delayed detection? (Fraud: $50K per undetected transaction → real-time justified. Sales report: no cost until end of day → batch sufficient.) Question 3: How fast does the organization act on the data? (If the alert fires at 2 AM and nobody looks until 9 AM → batch at 6 AM achieves the same outcome at lower cost.) Question 4: Does the team have streaming expertise? (If not, the 2-3x development cost becomes 5-8x. Train first or hire streaming engineers.) Question 5: Can the organization operate a streaming system? (24/7 monitoring, on-call rotation, incident response capability required.)

When Batch Wins

Batch is the right choice for: analytics and reporting (daily sales dashboards, weekly business reviews, monthly financial close — the audience consumes these on a schedule, not continuously), ML model training (models retrain daily, weekly, or monthly — training on streaming data adds complexity without improving model quality), data warehousing (the data warehouse is designed for analytical queries on historical data — batch loading at a defined schedule is the natural pattern), regulatory reporting (most regulatory reports are periodic — batch extraction aligned to the reporting schedule is simpler and more auditable), and small-to-medium data volumes (if the nightly batch job runs in 20 minutes and the data is available by 6 AM — the business value of reducing this to 5-minute freshness is near zero for most use cases). The batch architecture: scheduled pipelines → transform in Fabric/Spark → load to warehouse → refresh Power BI. Simple, reliable, well-understood.

When Streaming Wins

Streaming is the right choice for: fraud and anomaly detection (every minute of delay = more fraudulent transactions completed), IoT and operational monitoring (sensor data from manufacturing equipment needs real-time detection, not next-day batch), customer experience (real-time recommendations, live inventory updates, personalized pricing — freshness directly impacts conversion), event-driven architectures (microservices communicating via events need real-time processing), and high-volume CDC (tables with 1B+ rows where nightly full extraction is prohibitively expensive — CDC streaming captures only changes, reducing data transfer 99%+).

Hybrid Architecture: Lambda and Kappa

Lambda architecture: batch layer (processes all historical data nightly, produces accurate complete results) + speed layer (processes real-time stream, produces approximate recent results) + serving layer (merges batch and speed results). Advantage: accuracy from batch + freshness from streaming. Disadvantage: two separate codepaths to maintain. Kappa architecture: single streaming pipeline processes all data (both historical replay and real-time). The stream is the source of truth — batch results are derived by replaying the stream. Advantage: single codebase. Disadvantage: reprocessing historical data through streaming is slower than batch. Practical recommendation: Most organizations implement a pragmatic hybrid — batch for analytics and reporting (90% of use cases), streaming for the 2-3 use cases that genuinely require real-time (fraud, IoT, customer experience). This avoids Lambda/Kappa complexity while delivering real-time where it matters.

Migrating from Batch to Streaming

1

Identify the 2-3 use cases that justify real-time

Use the 5-question framework above. Don't migrate everything — migrate only where fresher data changes business outcomes.

2

Run streaming alongside batch

Deploy streaming for selected use cases while keeping batch running. Compare: are results consistent? Is the business actually consuming the real-time data?

3

Decommission batch for validated use cases

After 4-8 weeks of parallel operation with validated accuracy and confirmed business adoption: decommission batch for streaming use cases. Keep batch for everything else.

Pre-Implementation Checklist

Before building a streaming pipeline, confirm: the business decision changes with fresher data (documented, not assumed). The organization can act on real-time alerts (24/7 response capability exists or will be built). The team has streaming expertise (or the timeline includes a training/hiring phase). The on-call rotation includes streaming monitoring. The budget includes: always-on infrastructure, streaming engineers, and operational tooling. And most importantly: the batch pipeline has been evaluated — if batch delivers the same business outcome at 3-5x lower cost, batch is the correct engineering decision.

Team and Skills Requirements

CapabilityBatch TeamStreaming Team (Additional)
Core skillsSQL, Python/Spark, ETL patterns+Kafka/Event Hubs, Flink/Spark Streaming, event-driven design
OperationsJob scheduling, failure handling+Consumer lag monitoring, partition management, on-call rotation
ArchitectureBatch pipeline design, data modeling+Event schema design, exactly-once patterns, back-pressure handling
DebuggingRerun job, inspect intermediate data+Distributed tracing, event replay, state inspection

The streaming team needs everything the batch team knows PLUS streaming-specific skills. This is why the 2-3x development cost exists — the skill set is a superset, not a different set. Organizations that try to "learn streaming by building production streaming" pay the education cost in production incidents. Train on non-critical use cases first, or bring in experienced streaming engineers for the initial build.

Real-Time vs Near-Real-Time vs Batch: The Latency Spectrum

LatencyPatternInfrastructureCostUse Cases
Sub-secondTrue streamingAlways-on Flink/Kafka Streams$$$$Fraud, safety, trading
1-15 secondsMicro-batchSpark Structured Streaming$$$IoT, inventory, pricing
1-15 minutesFrequent batchScheduled Spark jobs$$Dashboards, operational reports
1-24 hoursStandard batchNightly/hourly ETL$Analytics, warehousing, ML training

The spectrum shows: each step from batch toward real-time doubles the infrastructure cost and operational complexity. The decision isn't binary (batch OR streaming) — it's: what latency does this specific use case require, and what's the minimum infrastructure that achieves it? A dashboard that refreshes every 15 minutes doesn't need sub-second streaming infrastructure — a scheduled Spark job running every 15 minutes achieves the same user experience at 1/10th the cost. Match the latency to the use case. Don't over-engineer the freshness.

Total Cost of Ownership: Streaming vs Batch Example

A concrete cost comparison for a data pipeline processing 10M events/day: batch (nightly) — infrastructure: Spark cluster running 2 hours/night ($300/month). Development: 2 weeks (well-understood patterns). Operations: 2 hours/month (monitor job success, handle occasional failures). Annual cost: $3,600 infra + $5K ops = $8,600/year. Streaming — infrastructure: Kafka cluster + Spark Streaming 24/7 ($3,000/month). Development: 6 weeks (stateful processing, exactly-once, monitoring). Operations: 8 hours/month (monitor lag, handle connector issues, on-call rotation). Annual cost: $36,000 infra + $20K ops = $56,000/year. Cost multiplier: 6.5x. The streaming pipeline delivers 5-minute freshness instead of 24-hour freshness. Is that improvement worth $47,400/year? For fraud detection: absolutely (preventing a single $50K fraud event justifies the annual cost). For a sales dashboard: probably not (the sales team reviews it once per morning regardless of refresh frequency).

Case Studies: Streaming Justified vs Not Justified

Justified — Fraud Detection: A financial services company processed credit card transactions in nightly batch. Fraud was detected next-day — by which time fraudulent transactions worth $200K-500K/month had completed. Real-time streaming investment: $300K (Kafka + Flink + operations). Result: fraud detected in sub-second, blocked before completion. Fraud losses reduced 85% = $170K-425K/month saved. ROI: payback in 2 months. Not Justified — Sales Dashboard: A retail company wanted "real-time sales dashboard" because a competitor had one. Investigation: the sales team reviews the dashboard once per day at 8 AM. Current nightly batch refresh delivers data by 6 AM. Streaming investment: $200K + $56K/year operations. Business value of real-time vs 6 AM: zero — the team doesn't look at the dashboard between 8 AM reviews. Decision: keep batch, invest the $200K in better analytics features. Partially Justified — Inventory: An e-commerce company had overselling issues (selling products that were already out of stock) due to 1-hour inventory updates. Full real-time streaming was considered ($150K) but a 5-minute micro-batch was implemented instead ($30K) — reducing overselling 95% at 1/5th the cost of full real-time. The 5-minute delay didn't impact the business outcome — the customer checkout process takes 3 minutes anyway.

Building the Business Case for Streaming

The streaming business case template: current state cost (what does the current batch latency cost the business? quantify: fraud losses, missed SLA penalties, overselling costs, customer churn from stale recommendations, or regulatory fines for delayed reporting), streaming investment (infrastructure cost + development cost + ongoing operations cost — use the multiplier framework: 3-5x batch cost), streaming benefit (current state cost × improvement percentage from reduced latency — be conservative: 50-70% improvement, not 100%), and ROI calculation (benefit - cost over 3 years, including ongoing operations. Target: ROI > 3x for the streaming use case to justify the complexity). If the ROI doesn't reach 3x — the business value doesn't justify the engineering investment. Improve the batch pipeline instead (optimize refresh frequency, improve data quality, add better alerting) for 80% of the benefit at 20% of the cost.

The Xylity Approach

We help organizations make the streaming vs batch decision with the 5-question framework — evaluating business value, detection cost, response time, team capability, and operational readiness. Our streaming engineers and data engineers build real-time where justified and optimize batch where batch is the right answer — because the best architecture matches the business need at the lowest sustainable cost.

Continue building your understanding with these related resources from our consulting practice.

Stream Where It Matters — Batch Where It Doesn't

5-question decision framework, hybrid architecture, migration roadmap. The honest assessment of when real-time justifies the 3-5x cost premium.

Start Your Architecture Assessment →