Skip to content

Continuous Deployment


  1. Continuous Delivery vs Continuous Deployment
  2. Advantages & Disadvantages
  3. Best Practices
  4. CD Tools Overview
  5. Zero-Downtime Deployment
  6. Deployment Strategies
  7. Blue-Green Deployment — Deep Dive
  8. Choosing the Right Strategy

1. Continuous Delivery vs Continuous Deployment

Section titled “1. Continuous Delivery vs Continuous Deployment”

Both are called “CD” — here is the exact difference:

CODE COMMITTED
CI runs (build, test, lint)
Artifact ready for release
├── Continuous DELIVERY
│ Manual approval required
│ A human clicks "deploy to prod"
└── Continuous DEPLOYMENT
No human in the loop
Automatically goes to production
if all tests pass

Simple analogy: Think of ordering a package online.

  • Delivery = the package arrives at your door. You still have to open it and decide if you want to keep it.
  • Deployment = the package arrives, a robot opens it, verifies the contents, and puts everything in place automatically. You just come home to a working product.

Continuous Deployment is the complete automation of the entire path from commit → production. No manual gates after the initial pipeline.


Speed to market — features go from a developer’s machine to customers’ hands within minutes of a commit passing tests. No waiting for release windows.

Real-time feedback loop — if a customer reports a bug on Monday, a fix can be deployed by Monday afternoon. Teams respond to the market in hours instead of weeks.

Smaller, safer releases — since every commit deploys individually, each release is tiny. A bug from one small change is far easier to isolate and fix than a bug introduced somewhere in a 200-commit quarterly release.

No “release day” stress — release is not an event anymore. It is just a normal pipeline run. Teams stop dreading deployments.

High upfront engineering cost — building a proper CD pipeline (with automated tests, monitoring, rollback capability, staging environments) takes significant time and investment before you see returns.

Ongoing maintenance — the pipeline itself needs to be maintained. Flaky tests, infrastructure drift, new services, and dependency updates all require pipeline work.

Requires mature test coverage — if your test suite has gaps, bad code silently reaches production. CD amplifies both good testing (fast shipping) and bad testing (fast bugs in prod).

Not suitable for every team — regulated industries (banking, healthcare) often require human sign-off before production releases. Full CD may be blocked by compliance requirements.


Write the test before writing the code. This ensures every new feature has automated coverage before it can enter the CD pipeline. Without this, gaps accumulate — code that “works” but has no test to catch regressions later.

Traditional: write code → maybe write tests → deploy
TDD: write test (fails) → write code (test passes) → deploy

The CD pipeline gate is the test suite. Its quality determines the safety of the entire system.

Once a CD pipeline is in place, it is the only way to deploy. No exceptions:

  • No SSH-ing into production to “quickly fix one line”
  • No copying files manually to a server
  • No live-editing config files on prod

Every manual change outside the pipeline breaks the deployment history — the record of what is running in production becomes inaccurate. The next automated deployment may then conflict with the manual change in unpredictable ways.

Package applications in Docker containers as part of the pipeline. This eliminates “works on my machine” problems because the container carries the exact runtime environment with it.

Without containers:
Works on dev laptop → fails on staging → "but it worked for me"
With containers:
Built in pipeline → same image runs on staging → same image runs on prod
Identical behaviour at every stage

Any proper CD tool must handle three things:

1. Automated testing → gate bad code before it reaches production
2. Rolling deployments → activate new code in a live environment without downtime
3. Monitoring & alerts → know when something goes wrong and trigger a rollback

Simple, focused deployment tool. Best for teams that just want automated deploys without heavy CI/CD overhead.

Key capabilities:

  • Deploy to multiple servers simultaneously from different branches
  • Run shell scripts before, during, or after a deployment
  • Real-time deployment progress tracking in the UI
  • One-click rollback to a previous release

Best for: small to mid-size teams wanting quick setup.

AWS’s managed deployment service. No servers to maintain — AWS handles the deployment infrastructure.

Key capabilities:

  • Deploys to EC2 instances, AWS Fargate, Lambda, and on-premises servers
  • Keeps a full deployment history with timeline tracking
  • Centralized control through AWS Console, CLI, SDK, or API
  • Integrates directly with CodePipeline for full CI/CD within AWS

Best for: teams already on AWS who want deployment tightly integrated with their cloud infrastructure.

Deployment automation server focused on complex multi-environment, multi-platform releases.

Key capabilities:

  • Supports .NET, Java, and other platforms with high-level deployment steps
  • Role-based deployment approvals (restrict who can deploy to production)
  • Schedule deployments for specific windows
  • Manages sensitive variables and secrets across environments
  • Can run on-premises or cloud

Best for: enterprise teams with complex deployment orchestration needs across many environments.

GitOps-based CD tool specifically for Kubernetes. The Git repository is the single source of truth — whatever is in Git is what should be running in the cluster.

Key capabilities:

  • Watches a Git repo and automatically syncs the cluster to match
  • Supports Helm, Kustomize, Ksonnet, Jsonnet, plain YAML
  • Manages deployments across multiple Kubernetes clusters
  • Web UI shows live application state vs desired state
  • Supports blue-green and canary deployments via sync hooks
  • Full audit trail of every deployment and API call
  • One-command rollback to any previous Git commit

Best for: teams running Kubernetes who want GitOps-style deployments.


An hour of downtime for an online service can cost thousands to millions of dollars in lost revenue, user trust, and SLA penalties. Zero-downtime deployment means users never experience an interruption while a new version is being released.

The core requirement: at no point during the deployment should the service be completely unavailable.

This is achieved through deployment strategies that:

  • Keep the old version running while the new one starts up
  • Switch traffic gradually or instantly to the new version
  • Maintain rollback capability throughout

Two identical environments always exist. One is live (blue), one is idle (green). Deploy the new version to green, test it, then switch all traffic from blue to green. Blue stays as instant rollback.

→ See full deep dive in Section 7


Release the new version to a small percentage of users first — 5%, 10%. Watch metrics. Gradually increase traffic to the new version if everything looks good. Roll back only the affected users if something breaks.

Initial: 100% → v1
After deploy:
95% → v1 (old)
5% → v2 (canary)
If metrics ok:
80% → v1
20% → v2
If still ok:
0% → v1
100% → v2

Advantages:

  • Real production traffic tests the new version before full rollout
  • Failures affect only a small percentage of users
  • No need for duplicate infrastructure (unlike blue-green)
  • Gradual confidence building before full commit

Disadvantages:

  • Two versions run simultaneously — both must be compatible with the same database schema
  • Rollback is more complex than blue-green (not a single switch)
  • Monitoring and traffic splitting adds infrastructure complexity
  • Slower to fully release than blue-green

Trade-offs: Less infrastructure cost than blue-green, but more operational complexity. The blast radius of a bad deploy is smaller.

When to choose: When you want real user data to validate a release before full rollout. Also good when you cannot afford double infrastructure. Ideal for large-scale consumer products (social media, e-commerce) where even 1% of users is a meaningful test sample.

Real-world example: Netflix uses canary deployments heavily. When releasing a new recommendation algorithm, they first roll it out to 1% of users. Internal dashboards track engagement, stream errors, and watch time. Only if those metrics match or improve on the baseline does the rollout continue to 5%, 25%, 100%.


Replace old instances with new instances one by one (or in small batches). At any point during the deployment, some instances run v1 and some run v2.

Start: [v1] [v1] [v1] [v1] [v1]
Step 1: [v2] [v1] [v1] [v1] [v1]
Step 2: [v2] [v2] [v1] [v1] [v1]
Step 3: [v2] [v2] [v2] [v1] [v1]
Step 4: [v2] [v2] [v2] [v2] [v1]
Step 5: [v2] [v2] [v2] [v2] [v2]

Advantages:

  • No duplicate infrastructure needed
  • Faster than blue-green (no environment build-up phase)
  • Gradual rollout gives time to catch issues

Disadvantages:

  • Both v1 and v2 serve traffic simultaneously — database schema changes are risky
  • Rollback is slow (you have to roll all instances back one by one)
  • No environment isolation between old and new versions

Trade-offs: Lower infrastructure cost than blue-green, but rollback is painful compared to blue-green’s instant switch.

When to choose: When you have limited infrastructure budget and the new version is backward-compatible with the current one. Good for stateless applications where running two versions simultaneously is safe.

Real-world example: Kubernetes rolling updates work this way by default. A kubectl set image command replaces pods one by one with the new image while keeping the service online throughout.


The new version (v2) runs alongside the old version (v1), but receives a copy of real production traffic — without actually responding to users. v1 still handles all real responses. v2’s responses are discarded (or compared internally).

User request
├──► v1 (active) → real response sent to user
└──► v2 (shadow) → response discarded / compared internally

Advantages:

  • v2 gets tested under real production load with zero risk to users
  • Great for catching performance bottlenecks before going live
  • No user impact at all if v2 breaks

Disadvantages:

  • Complex to set up — requires traffic mirroring infrastructure
  • Side effects are dangerous — if v2 writes to a database, processes a payment, sends an email, those are real actions, not test actions
  • Requires a mocking layer for any third-party service calls to prevent double actions
  • High infrastructure cost (running two full stacks)

Trade-offs: Maximum safety for testing, maximum complexity and cost to operate. Only justifiable when correctness is critical and you have engineering resources to build the mocking layer.

When to choose: For critical services where you must validate behaviour under real load before any user exposure. Common in payment systems, fraud detection, and machine learning model replacements.

Real-world example: A bank replacing their fraud detection engine. v2 (new ML model) receives a mirror of all real transactions but v1 makes all the actual approve/deny decisions. Engineers compare v1 and v2 outputs for weeks before switching live traffic to v2.


Split users into two groups — Group A gets v1 (control), Group B gets v2 (experiment). Collect data on user behaviour metrics. Make a business decision based on the data.

Users
├── Group A (50%) ──► v1 (control) ┐
│ ├── Compare: conversion rate,
└── Group B (50%) ──► v2 (experiment) ┘ engagement, revenue

Advantages:

  • Data-driven decision making — you don’t guess, you measure
  • Tests hypotheses about user behaviour before full rollout
  • Groups can be defined by any user attribute (geography, device, account age)

Disadvantages:

  • Requires analytics infrastructure to collect and compare metrics meaningfully
  • Statistical significance takes time — you need enough users in both groups
  • Not suitable for bug fixes (you don’t A/B test a crash fix)
  • Managing state and sessions across two versions adds complexity

Trade-offs: A/B is a product decision tool, not purely a deployment tool. It answers “which version performs better with users” rather than “is this version safe to run.”

When to choose: For UI/UX changes, new features, pricing changes, or algorithm updates where user behaviour is the success metric. Not the right choice when correctness and stability are the primary concerns.

Real-world example: Spotify A/B tests their homepage layout. Group A sees the current layout, Group B sees a new layout with a redesigned search bar. After two weeks, Spotify’s data team compares search usage rates, listening session lengths, and playlist creation rates between groups. If Group B metrics are better, the new layout rolls out to everyone.


Shut down the old version entirely, then start the new version. Simple, but users experience downtime in between.

v1 running
v1 shut down ← DOWNTIME STARTS HERE
v2 starts up ← DOWNTIME ENDS HERE
v2 running

Advantages:

  • Extremely simple to implement — no special tooling needed
  • Application state is completely fresh — no stale state from v1

Disadvantages:

  • Guaranteed downtime. Duration = shutdown time + startup time of new version.
  • Users are actively interrupted mid-session

When to choose: Only for non-production environments (dev, staging), batch jobs with no live users, or internal tools where brief downtime is acceptable.


Two identical production environments run side by side — blue (current live version) and green (new version being prepared). A load balancer sits in front of both and controls which environment receives user traffic.

At any given moment, only one environment is live. The other is idle — available as an instant rollback target.

Prerequisites before you can use blue-green

Section titled “Prerequisites before you can use blue-green”
  • Two identical infrastructure environments must exist (same server specs, same config, same dependencies).
  • Both environments must be able to connect to the same (or a compatible) database — the new version’s schema changes must be backward-compatible with the old version
  • A load balancer or router that can switch traffic between the two environments with no DNS change required

Phase 1: Setup

Load Balancer
BLUE (v1.1) ← 100% of traffic, live
GREEN (v1.2) ← idle, being prepared

Deploy v1.2 to the green environment. It is running but receives no user traffic.

Phase 2: Switch

Load Balancer
├──► BLUE (v1.1) ← 0% traffic, kept on standby
└──► GREEN (v1.2) ← 100% traffic, now live

The load balancer redirects all traffic from blue to green. This switch is instantaneous. Most users never notice — the DNS record does not change. The load balancer simply changes its routing target.

Phase 3: Monitor

DevOps engineers immediately run smoke tests on the live green environment:

  • Are APIs responding?
  • Are error rates elevated?
  • Are response times normal?
  • Are background jobs running?

This is the critical window. The blue environment is still warm and ready for instant rollback.

Phase 4: Rollback or Retire

If green has problems:
Load balancer switches back to BLUE in seconds
Users are on v1.1 again, unaware anything happened
If green is healthy after monitoring period:
Blue is retired (or becomes the new idle environment for next release)
Green becomes the new blue
A new green environment is prepared for the next release

Seamless user experience — traffic switch is instantaneous. No user sees a loading spinner or error page during deployment.

Instant rollback — if something goes wrong, switching back to blue takes seconds. This is the fastest rollback of any deployment strategy.

No upgrade scheduling — no maintenance windows, no 2am deployments to avoid peak traffic. Deploy any time.

Testing in production parity — the green environment is identical to production. You are not testing in a simulated environment — you are testing the exact thing that will serve users.

Disaster recovery practice built-in — because switching between two live environments is a regular operation, teams become practiced at it. The same mechanism works for disaster recovery.

High infrastructure cost — you permanently maintain double the infrastructure. If your application runs on 20 servers, blue-green means 40 servers. On cloud infrastructure with elastic scaling, this cost can be partially managed by spinning down the idle environment between releases, but the standby environment must come up quickly when needed.

Database schema changes are hard — this is the biggest practical challenge. If v1.2 adds a new database column, the old v1.1 code (still running in blue as a standby) may not understand that column. If you need to roll back to blue, the database already has the new column. Strategy: make all database changes backward-compatible — add columns but never remove or rename them until the old version is fully retired.

Session and user routing issues — if a user has an active session in blue and traffic switches to green, their session state may not exist in green. They get logged out or mid-transaction requests fail. Solutions:

  • Use a shared session store (Redis) accessible by both environments
  • Configure the load balancer to drain existing connections gracefully before completing the switch (allows existing sessions to finish on blue while new sessions go to green)
  • Accept that a small number of in-flight requests will fail during the switch
Instantaneous switch:
All users immediately moved to green
In-flight requests on blue → fail or require re-login
Simple but some users impacted
Graceful drain:
New connections → green
Existing connections → finish on blue, then close
Slower but zero disruption to active users

Code compatibility — because blue and green run simultaneously and both connect to the same production database, every release must be compatible with the current database state. You cannot deploy a version that requires a schema that does not yet exist in the database.

An online retailer runs their checkout service on blue (v2.3). They have built v2.4 with a faster payment processing flow.

Before deployment:
Load Balancer → BLUE (v2.3) ← all 50,000 concurrent users
GREEN (v2.4) ← deployed, running, zero traffic
Release day (not a night, not a weekend — any time):
1. Team deploys v2.4 to green environment
2. Team runs automated smoke tests against green directly
(test checkout flow, payment processing, order confirmation)
3. All smoke tests pass
4. Load balancer switches: 100% traffic → GREEN (v2.4)
5. Team monitors dashboards for 30 minutes:
- Payment success rate: 99.7% (same as before)
- Page load time: 340ms (down from 480ms — improvement confirmed)
- Error rate: 0.01% (normal baseline)
6. Blue (v2.3) remains on standby for 2 hours
7. No issues — blue is retired
8. Green is now called blue
9. A new green environment is prepared for v2.5

If at step 5 the payment success rate dropped to 94%, the team would switch the load balancer back to blue (v2.3) in under 10 seconds. Users would barely notice.


SITUATION RECOMMENDED STRATEGY
───────────────────────────────────────── ────────────────────
Mission-critical service, instant rollback a Blue-Green
must, can afford double infrastructure
Large user base, want real data before Canary
full rollout, limited infrastructure budget
Kubernetes-native, stateless app, Rolling
backward-compatible release
Testing behaviour under production load, Shadow
cannot risk any user impact, critical service
Need to measure user behaviour impact A/B Testing
of a product or UI change
Internal tool, dev environment, batch job, Recreate
downtime is acceptable

Most mature organizations do not pick one strategy exclusively. A common pattern:

1. Shadow → validate new ML model with real traffic before exposing to users
2. Canary → release to 5% of users, monitor for 24 hours
3. Rolling → gradually replace remaining 95% of instances
4. Blue-Green maintained as the emergency rollback mechanism throughout

The deployment strategy is a trade-off between:

SAFETY ←────────────────────────────────────────────► SPEED
Shadow Blue-Green Canary Rolling Recreate
COST ←────────────────────────────────────────────► SAVINGS
Blue-Green Shadow Canary Rolling Recreate
ROLLBACK SPEED ←────────────────────────────────────────────► SLOW
Blue-Green Canary Shadow Rolling Recreate

No strategy wins on all dimensions. The right choice depends on the specific risk tolerance, infrastructure budget, and user experience requirements of your organization.