The Testing Gap: Why "It Runs" Isn't "It Works"

A retail company migrates its order management system to Azure. The application starts successfully. The team declares the migration complete. Monday morning: the weekly batch job that calculates loyalty points runs 4x slower than on-premises — it was optimized for local disk I/O, and Azure managed disks have different latency characteristics. The payment gateway integration returns timeout errors intermittently — the on-premises firewall rules allowed the connection, but the Azure NSG rules are subtly different. And the reporting module produces different numbers than the on-premises version — a timezone configuration difference causes transactions near midnight to be attributed to different days.

Each of these was testable before cutover. None was tested because the team validated "application starts successfully" and moved on. Migration testing isn't about confirming the application runs — it's about confirming the application produces correct results, at acceptable performance, with all integrations functioning, in the cloud environment that differs from on-premises in dozens of subtle ways.

Migration testing validates that the cloud deployment is functionally equivalent to the on-premises deployment — not that it starts without errors. "Starts without errors" is a health check. "Produces the same results" is migration validation. — Xylity Cloud Practice

4-Layer Migration Testing Strategy

LayerWhat It ValidatesWhen to RunDuration
1. FunctionalApplication produces correct resultsAfter migration, before cutover2-5 days per app
2. PerformanceResponse times and throughput match or exceed baselineAfter functional passes1-3 days per app
3. IntegrationAll connections, APIs, and data flows workAfter performance passes1-2 days per app
4. Cutover RehearsalTraffic switch procedure works end-to-end1-2 weeks before production cutover4-8 hours

Layer 1: Functional Validation

Functional validation confirms that the cloud deployment produces the same outputs as the on-premises deployment given the same inputs. This isn't regression testing of the application's features — it's equivalence testing of the migration itself.

Data Validation

For database migrations: row counts match between source and target (per table), checksums validate data integrity (hash comparison on critical columns), referential integrity is preserved (foreign keys, constraints), and stored procedures/triggers produce identical results on test data. For applications with batch processing: run the batch on both environments with identical input and compare outputs — every number, every record, every report must match. Discrepancies indicate: data migration errors (rows lost or duplicated), timezone/locale differences, or calculation precision differences between platforms.

Business Process Testing

Execute the top 20 business processes end-to-end on the cloud deployment: create an order, process a payment, generate a report, run month-end closing, execute a batch job. Each process should produce identical results to the on-premises system. Business users — not just IT — should validate the results because they know what "correct" looks like for their domain. A report that's technically accurate but formatted differently confuses users and erodes trust.

Edge Case Testing

Test scenarios that stress the differences between on-premises and cloud: large file processing (cloud storage has different I/O characteristics), concurrent user load (cloud networking behaves differently under contention), timezone-sensitive operations (cloud VMs default to UTC, not local time), and long-running transactions (cloud load balancers may time out connections that worked on-premises). These edge cases are where migration bugs hide — and where production incidents occur if untested.

Layer 2: Performance Testing

Cloud performance characteristics differ from on-premises: network latency between components (microseconds on-premises, milliseconds in cloud), disk I/O patterns (local SSD vs. managed disk vs. Blob storage), memory management (VM sizing affects available memory differently), and compute scheduling (shared tenancy means occasional "noisy neighbor" effects). Performance testing validates that these differences don't degrade user experience or batch processing timelines.

Baseline Comparison

Before migration, capture on-premises performance baselines: page load times (P50, P95, P99), API response times, batch job durations, database query execution times, and peak concurrent user capacity. After migration, run identical tests on the cloud deployment and compare. The cloud deployment should match or exceed every baseline metric. Degradations greater than 20% require investigation and optimization before cutover.

Load Testing

Simulate production traffic patterns on the cloud deployment: normal load (average daily traffic), peak load (busiest hour/day), and stress load (2x peak to validate headroom). Tools: Azure Load Testing (managed), JMeter, k6, or Locust. Measure: response time under load, error rate at peak, auto-scaling behavior (does the application scale when traffic increases?), and recovery after load (does performance return to normal after the spike?). Load testing reveals capacity constraints that functional testing can't detect — the application works perfectly for 10 users but degrades at 500.

Performance Optimization Patterns

Common performance fixes post-migration: right-size VMs (over-provisioned VMs waste money; under-provisioned degrade performance), switch to Premium SSD for I/O-intensive workloads (Standard HDD migrated from on-premises is often the bottleneck), enable connection pooling for database connections (cloud network latency makes connection establishment more expensive), and implement caching (Azure Redis Cache for frequently accessed data that was served from local memory on-premises). Each optimization addresses a specific cloud-vs-on-premises difference.

Layer 3: Integration and Connectivity

Integration testing validates every connection the application uses: database connections (to migrated and on-premises databases), API integrations (to internal services and external third parties), file transfers (SFTP, Azure Files, Blob Storage), message queues (Service Bus, Event Hubs), authentication (Entra ID, on-premises AD, federation), and network connectivity (VPN, ExpressRoute, peering). Each connection must be tested under production-equivalent conditions — not just "can I connect?" but "can I sustain the connection under load without timeouts?"

Hybrid Connectivity Testing

During wave-based migration, some applications are in the cloud while their dependencies remain on-premises. Hybrid connectivity testing validates: VPN/ExpressRoute throughput and latency between cloud and on-premises, DNS resolution for services in both environments, authentication flows that span cloud and on-premises AD, and data replication for databases that need to be accessible from both environments. Hybrid connectivity is the most fragile part of a phased migration — test it extensively before relying on it for production traffic.

The Cutover Runbook: Step-by-Step Traffic Switch

The cutover switches production traffic from on-premises to cloud. The runbook specifies every step, every responsible person, every validation checkpoint, and every rollback trigger.

Cutover Runbook Template

1

T-24h: Final Data Sync

Run final data synchronization from on-premises to cloud databases. Verify row counts and checksums match. Confirm all pending transactions are processed. Lock the on-premises application for writes (read-only mode) to prevent new data during cutover window.

2

T-4h: Pre-Cutover Validation

Run the functional test suite on the cloud deployment with the final-synced data. Verify all integration connections are active. Confirm monitoring and alerting is operational for the cloud deployment. Brief the support team on the cutover timeline and escalation procedures.

3

T-0: DNS/Traffic Switch

Update DNS records to point to the cloud deployment (with low TTL set 48 hours prior). Or switch the load balancer/traffic manager to route to cloud endpoints. Monitor: are requests arriving at the cloud deployment? Are responses correct? Are error rates within normal range?

4

T+1h: Post-Cutover Validation

Run the critical business process tests on production (with real traffic flowing). Verify: data is being written correctly, integrations are functioning, performance is within baseline, and users can access the application. If any validation fails → execute rollback.

5

T+24h: Confirm Stable

After 24 hours of production traffic: review error rates, performance metrics, user-reported issues, and data integrity checks. If stable → declare cutover complete. Begin hypercare period. Decommission on-premises after 30-day retention period.

Rollback Planning: The Safety Net You Hope You Don't Need

Every cutover needs a tested rollback plan — the procedure to revert to the on-premises deployment if the cloud deployment fails in production. The rollback plan specifies: the rollback trigger (which metrics or events trigger the decision), the rollback procedure (DNS revert, data sync back to on-premises, cache invalidation), the rollback timeline (how long does rollback take? — test this during rehearsal), and the data reconciliation process (transactions processed on the cloud deployment during the failed cutover must be captured and replayed on the on-premises system after rollback).

The rollback rehearsal: Practice the rollback before the actual cutover. Switch traffic to cloud, run for 2 hours with test traffic, then execute the rollback procedure. Measure: how long does rollback take? Does data reconciliation work? Are there any manual steps that could fail under pressure? The rehearsal reveals rollback issues in a controlled environment — not during a production crisis at 2 AM.

Post-Migration Validation and Hypercare

The hypercare period (typically 2-4 weeks post-cutover) provides elevated monitoring and rapid response for the newly migrated application. During hypercare: monitoring thresholds are tightened (alert on 10% degradation instead of 30%), the migration team remains on-call (rapid response to any issue), daily health checks compare cloud performance to on-premises baselines, and user feedback is actively collected (issues that monitoring doesn't detect). After hypercare concludes without significant issues, the application transitions to standard operations and the on-premises deployment is decommissioned.

Database Migration Validation: The Most Critical Test

Database migration carries the highest risk — data loss or corruption is irreversible and business-critical. Database validation includes: row count comparison per table (source vs. target), checksum validation on critical columns (hash comparison ensures data integrity at the byte level), referential integrity verification (all foreign key relationships preserved), stored procedure execution comparison (run procedures on both environments and compare results), and performance baseline comparison (query execution times on the migrated database vs. source). For large databases (1TB+), full validation may take 8-24 hours — schedule this in the migration window before cutover, not after. A database migration that loses 0.01% of rows is unacceptable in financial systems — the validation must confirm zero data loss.

The Xylity Approach

We execute migration testing through the 4-layer validation strategy — functional equivalence, performance comparison, integration verification, and cutover rehearsal. Our Azure engineers and DevOps engineers build the test suites, execute performance benchmarks, author the cutover runbook, and provide hypercare support — ensuring every migrated application works correctly, not just runs successfully.

Continue building your understanding with these related resources from our consulting practice.

Test Migration Before You Cut Over

Four layers — functional, performance, integration, cutover rehearsal. Migration testing that turns 'it runs' into 'it works correctly.'

Start Your Migration Testing Program →