eCommerce / D2CSouth Asia10 Weeks

D2C eCommerce Platform

A fast-growing fashion and lifestyle D2C brand was losing sales and customer trust every time they ran a major sale event. Their fixed VM infrastructure could not handle traffic spikes, deployments were manual and risky, and every peak event was a potential crisis. With a high-stakes annual sale 10 weeks away, we rebuilt their platform to handle 10× normal load — and it delivered flawlessly.

Downtime During Peak Sale

10×

Peak Traffic Handled

53%

Faster Page Load Times

38%

Cloud Cost Reduction

!The Challenge

This D2C brand had built a loyal following through strong product and marketing — but their technology was not keeping pace. Their platform ran on a fixed set of virtual machines on AWS, provisioned nearly two years earlier with no thought for scalability. During normal traffic, performance was acceptable. During sale events, it was a disaster.

Their biggest annual sale the previous year resulted in the site being completely inaccessible for nearly 90 minutes at peak — right when tens of thousands of customers were trying to purchase. The team estimated they lost a significant portion of projected revenue, plus untold damage to brand trust and customer loyalty. They had attempted to "scale up" by manually upgrading to larger EC2 instances before the event, but there was no automation, no auto-scaling, and no way to respond to unexpected traffic surges in real time.

Deployments were equally fragile. There was no staging environment — all code changes were pushed directly to production, often late at night to avoid peak hours. New features were frequently released with bugs that impacted customers before engineers could respond. The team had started freezing deployments in the weeks before sale events entirely — which meant features and fixes were bottlenecked for weeks at a time.

With their annual sale 10 weeks away, they came to us with a clear mandate: make the platform scale, or the next sale event will be the last one on their current infrastructure.

⇄Before vs After

AreaBeforeAfter

ScalingFixed VMs, manual resize before eventsAuto-scaling ECS, responds within 60 seconds

Peak TrafficSite down at 3× normal load10× load handled with zero downtime

Page SpeedSlow — no CDN, all served from origin53% faster — CloudFront + Redis caching

DeploymentsManual, direct to production, nights onlyAutomated CI/CD with staging gate

DatabaseSingle RDS, connection exhaustion at peakMulti-AZ RDS + read replica + PgBouncer

Cloud CostsOversized fixed VMs running 24/738% lower — right-sized + auto-scaling

⚙Tech Stack

Compute & Scaling

AWS ECS Fargate, Application Auto Scaling, ALB

CDN & Performance

Amazon CloudFront, S3 (static assets), ElastiCache Redis

CI/CD

GitHub Actions, Docker, Amazon ECR, blue/green deployments

Infrastructure as Code

Terraform, modular workspaces per environment

Monitoring & Load Testing

Grafana, Prometheus, CloudWatch, k6 load testing

Database

Amazon RDS PostgreSQL Multi-AZ, read replica, PgBouncer

→What We Did

Auto-Scaling Infrastructure Migration

We migrated the application from fixed EC2 instances to AWS ECS Fargate with Application Auto Scaling policies. The platform now monitors CPU and request-count metrics and adds new container instances within 60 seconds when traffic spikes. During the sale event, the platform automatically scaled from its baseline of 4 tasks to 28 tasks at peak — transparently, without any human intervention.

CDN & Caching Layer

We implemented Amazon CloudFront with a carefully designed caching strategy. Product listing pages, images, and static assets are cached at the edge. Dynamic personalisation and cart operations bypass the cache appropriately. We also introduced Redis via ElastiCache for session management and frequently-accessed product data, reducing database load by over 60% during the peak event. Page load times improved by 53%.

Database Scaling for Peak Traffic

The existing single RDS instance was a bottleneck. We upgraded to a Multi-AZ RDS deployment for failover resilience and added a read replica specifically for product catalogue queries — the heaviest read workload during sale events. PgBouncer was introduced as a connection pooler to prevent connection exhaustion under high concurrency.

CI/CD & Staging Environment

We built a complete GitHub Actions pipeline with Docker image builds, ECR pushes, and blue/green deployments to ECS. A fully isolated staging environment was created that mirrors production infrastructure. All changes must pass automated tests and be deployed to staging before production. The team went from deploying once a week nervously to deploying multiple times a day with confidence.

Load Testing & Game-Day Simulation

Three weeks before the sale, we ran structured load tests using k6, simulating 10× normal traffic in a staging environment identical to production. This revealed two bottlenecks — a slow product search query and a session management race condition — which we fixed before the actual event. On sale day, the team watched dashboards calmly instead of firefighting.

✦Key Engineering Decisions

Decision: ECS over Kubernetes for speed of delivery

With 10 weeks to the sale event, we could not spend 4 weeks setting up and learning EKS. ECS Fargate delivered the auto-scaling capability they needed in a fraction of the time, with significantly less operational complexity for a team their size.

Decision: Blue/green deployments over rolling updates

For an eCommerce platform, zero-downtime deployments are non-negotiable. Blue/green gives instant rollback capability — if a deployment has issues, traffic switches back to the previous version in seconds rather than waiting for a rolling update to complete.

Decision: Load testing before go-live, not after

Most teams find their scaling issues on the day of the event. We invested two weeks in structured load testing in a production-identical staging environment. The two issues we found and fixed would almost certainly have caused partial outages during the actual sale.

⏱Engagement Timeline

Week 1–2

Architecture Audit & Planning

Assessed existing VM setup, identified scaling bottlenecks, mapped database query performance, and designed target architecture.

Week 3–4

Infrastructure Migration

Migrated application to ECS Fargate. Configured Application Auto Scaling. Built Terraform modules for all environments.

Week 5

CDN & Caching

CloudFront distribution configured. Redis cluster deployed. Caching strategy implemented and tuned per page type. Page loads 53% faster.

Week 6–7

CI/CD & Staging

GitHub Actions pipeline built. Staging environment created. Blue/green deployment strategy implemented and tested.

Week 8–9

Load Testing & Fixes

Structured load tests at 5× and 10× normal traffic. Two critical issues identified and resolved. Database read replica added.

Week 10

Sale Day & Handover

Sale event executed flawlessly — zero downtime, 10× traffic handled. Post-event review and full documentation delivered.

✓Results Delivered

✓Zero downtime during their biggest annual sale

✓Platform handled 10× normal traffic automatically

✓53% improvement in page load times

✓38% reduction in monthly cloud infrastructure costs

✓Two critical scaling bugs found and fixed pre-launch

✓Full CI/CD with staging — deploy any time safely

✓Blue/green deployments — instant rollback capability

✓Team trained and confident to operate independently

"Our biggest sale of the year went off without a single incident. For the first time ever, we were watching dashboards and celebrating instead of firefighting. The load tests ESSEMVEE ran found two bugs that would have taken us down. Worth every rupee."

Head of Technology

D2C eCommerce Platform · South Asia ·

Facing Similar Challenges?

Book a free 30-minute call — no obligation, no sales pitch.

Schedule Free Consultation

Free 30-minute call · No obligation