Downtime is lost revenue. At 10,000 users, even a 2-minute deployment window costs you 14 unhappy sessions. At a million users, it's a disaster. The good news: zero-downtime deployments are not difficult—they're just misunderstood.
We'll cover blue-green deployments, canary releases, feature flags, Kubernetes rolling updates, and database migration strategies that let you ship fearlessly at any scale.
Blue-Green Deployments
The concept: maintain two identical production environments. At any time, one is live (blue), one is idle (green). Deploy to green, run smoke tests, then flip the load balancer. Rollback is instant—just flip back.
# AWS Application Load Balancer — swap target groups
aws elbv2 modify-listener --listener-arn arn:aws:...listener/... --default-actions '[{
"Type": "forward",
"TargetGroupArn": "arn:aws:...:targetgroup/green/..."
}]'
# Health check passes → green is now live
# Previous blue stays warm for instant rollback
The key constraint: your application and database must support the old and new schema simultaneously during the switch. Design migrations accordingly.
Canary Releases
Instead of a hard cut-over, canary releases send a small percentage of traffic to the new version. If metrics look good, gradually ramp up. This limits blast radius of a bad deploy.
# NGINX weighted upstream
upstream api {
server api-v1:3000 weight=90; # 90% production
server api-v2:3000 weight=10; # 10% canary
}
# Monitor error rate on v2 before increasing weight
# Automate with Datadog monitors + deployment hooks
Feature Flags
Feature flags decouple deployment from release. You deploy code dark (off for everyone), then enable it for segments (internal team → beta users → 10% → 100%) without re-deploying.
// Using LaunchDarkly SDK
import { init } from 'launchdarkly-node-server-sdk';
const ldClient = init(process.env.LD_SDK_KEY!);
await ldClient.waitForInitialization();
// In your route handler
const showNewCheckout = await ldClient.variation(
'new-checkout-flow',
{ key: user.id, email: user.email, plan: user.plan },
false // default if LD unreachable
);
if (showNewCheckout) {
return renderNewCheckout();
}
return renderLegacyCheckout();
0s
Deployment downtime
<5s
Rollback time
99.99%
Uptime achieved
Kubernetes Rolling Updates
Kubernetes rolling updates replace pods incrementally. Configure maxUnavailable: 0 to ensure capacity is never reduced below 100% during the rollout.
# deployment.yaml
spec:
replicas: 6
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2 # spin up 2 extra pods before killing old ones
maxUnavailable: 0 # never drop below 6 ready pods
template:
spec:
containers:
- name: api
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
The readinessProbe is critical—Kubernetes won't route traffic to a new pod until it passes, preventing requests hitting a pod that hasn't finished starting.
Instant Rollback
The fastest rollback is feature flags—disable the flag, no redeploy needed. For a full rollback: kubectl rollout undo deployment/api — Kubernetes keeps the previous ReplicaSet and rolls back within seconds.
# See rollout history
kubectl rollout history deployment/api
# Roll back to specific revision
kubectl rollout undo deployment/api --to-revision=3
# Monitor
kubectl rollout status deployment/api
Summary
- Blue-green: maintain two environments, flip the load balancer — instant rollback
- Canary: route a small % to the new version, ramp up as confidence grows
- Feature flags: decouple deployment from release — the ultimate safety net
- Set
maxUnavailable: 0in Kubernetes to avoid capacity drops during rollout - Always pair deployments with a readiness probe so traffic doesn't land on cold pods