Continuous Deployment

1. Continuous Delivery vs Continuous Deployment

Both are called “CD” — here is the exact difference:

CODE COMMITTED
      │
      ▼
   CI runs (build, test, lint)
      │
      ▼
   Artifact ready for release
      │
      ├── Continuous DELIVERY
      │     Manual approval required
      │     A human clicks "deploy to prod"
      │
      └── Continuous DEPLOYMENT
            No human in the loop
            Automatically goes to production
            if all tests pass

Simple analogy: Think of ordering a package online.

Delivery = the package arrives at your door. You still have to open it and decide if you want to keep it.
Deployment = the package arrives, a robot opens it, verifies the contents, and puts everything in place automatically. You just come home to a working product.

Continuous Deployment is the complete automation of the entire path from commit → production. No manual gates after the initial pipeline.

2. Advantages & Disadvantages

Advantages

Speed to market — features go from a developer’s machine to customers’ hands within minutes of a commit passing tests. No waiting for release windows.

Real-time feedback loop — if a customer reports a bug on Monday, a fix can be deployed by Monday afternoon. Teams respond to the market in hours instead of weeks.

Smaller, safer releases — since every commit deploys individually, each release is tiny. A bug from one small change is far easier to isolate and fix than a bug introduced somewhere in a 200-commit quarterly release.

No “release day” stress — release is not an event anymore. It is just a normal pipeline run. Teams stop dreading deployments.

Disadvantages

High upfront engineering cost — building a proper CD pipeline (with automated tests, monitoring, rollback capability, staging environments) takes significant time and investment before you see returns.

Ongoing maintenance — the pipeline itself needs to be maintained. Flaky tests, infrastructure drift, new services, and dependency updates all require pipeline work.

Requires mature test coverage — if your test suite has gaps, bad code silently reaches production. CD amplifies both good testing (fast shipping) and bad testing (fast bugs in prod).

Not suitable for every team — regulated industries (banking, healthcare) often require human sign-off before production releases. Full CD may be blocked by compliance requirements.

3. Best Practices

Test-Driven Development (TDD)

Write the test before writing the code. This ensures every new feature has automated coverage before it can enter the CD pipeline. Without this, gaps accumulate — code that “works” but has no test to catch regressions later.

Traditional:  write code → maybe write tests → deploy
TDD:          write test (fails) → write code (test passes) → deploy

The CD pipeline gate is the test suite. Its quality determines the safety of the entire system.

One Deployment Path Only

Once a CD pipeline is in place, it is the only way to deploy. No exceptions:

No SSH-ing into production to “quickly fix one line”
No copying files manually to a server
No live-editing config files on prod

Every manual change outside the pipeline breaks the deployment history — the record of what is running in production becomes inaccurate. The next automated deployment may then conflict with the manual change in unpredictable ways.

Containerization

Package applications in Docker containers as part of the pipeline. This eliminates “works on my machine” problems because the container carries the exact runtime environment with it.

Without containers:
  Works on dev laptop → fails on staging → "but it worked for me"

With containers:
  Built in pipeline → same image runs on staging → same image runs on prod
  Identical behaviour at every stage

4. CD Tools Overview

What a CD tool needs to do

Any proper CD tool must handle three things:

1. Automated testing     → gate bad code before it reaches production
2. Rolling deployments   → activate new code in a live environment without downtime
3. Monitoring & alerts   → know when something goes wrong and trigger a rollback

DeployBot

Simple, focused deployment tool. Best for teams that just want automated deploys without heavy CI/CD overhead.

Key capabilities:

Deploy to multiple servers simultaneously from different branches
Run shell scripts before, during, or after a deployment
Real-time deployment progress tracking in the UI
One-click rollback to a previous release

Best for: small to mid-size teams wanting quick setup.

AWS CodeDeploy

AWS’s managed deployment service. No servers to maintain — AWS handles the deployment infrastructure.

Key capabilities:

Deploys to EC2 instances, AWS Fargate, Lambda, and on-premises servers
Keeps a full deployment history with timeline tracking
Centralized control through AWS Console, CLI, SDK, or API
Integrates directly with CodePipeline for full CI/CD within AWS

Best for: teams already on AWS who want deployment tightly integrated with their cloud infrastructure.

Octopus Deploy

Deployment automation server focused on complex multi-environment, multi-platform releases.

Key capabilities:

Supports .NET, Java, and other platforms with high-level deployment steps
Role-based deployment approvals (restrict who can deploy to production)
Schedule deployments for specific windows
Manages sensitive variables and secrets across environments
Can run on-premises or cloud

Best for: enterprise teams with complex deployment orchestration needs across many environments.

Argo CD

GitOps-based CD tool specifically for Kubernetes. The Git repository is the single source of truth — whatever is in Git is what should be running in the cluster.

Key capabilities:

Watches a Git repo and automatically syncs the cluster to match
Supports Helm, Kustomize, Ksonnet, Jsonnet, plain YAML
Manages deployments across multiple Kubernetes clusters
Web UI shows live application state vs desired state
Supports blue-green and canary deployments via sync hooks
Full audit trail of every deployment and API call
One-command rollback to any previous Git commit

Best for: teams running Kubernetes who want GitOps-style deployments.

5. Zero-Downtime Deployment

An hour of downtime for an online service can cost thousands to millions of dollars in lost revenue, user trust, and SLA penalties. Zero-downtime deployment means users never experience an interruption while a new version is being released.

The core requirement: at no point during the deployment should the service be completely unavailable.

This is achieved through deployment strategies that:

Keep the old version running while the new one starts up
Switch traffic gradually or instantly to the new version
Maintain rollback capability throughout

6. Deployment Strategies

Blue-Green

Two identical environments always exist. One is live (blue), one is idle (green). Deploy the new version to green, test it, then switch all traffic from blue to green. Blue stays as instant rollback.

→ See full deep dive in Section 7

Canary

Release the new version to a small percentage of users first — 5%, 10%. Watch metrics. Gradually increase traffic to the new version if everything looks good. Roll back only the affected users if something breaks.

Initial:    100% → v1

After deploy:
  95% → v1 (old)
   5% → v2 (canary)

If metrics ok:
  80% → v1
  20% → v2

If still ok:
  0%  → v1
 100% → v2

Advantages:

Real production traffic tests the new version before full rollout
Failures affect only a small percentage of users
No need for duplicate infrastructure (unlike blue-green)
Gradual confidence building before full commit

Disadvantages:

Two versions run simultaneously — both must be compatible with the same database schema
Rollback is more complex than blue-green (not a single switch)
Monitoring and traffic splitting adds infrastructure complexity
Slower to fully release than blue-green

Trade-offs: Less infrastructure cost than blue-green, but more operational complexity. The blast radius of a bad deploy is smaller.

When to choose: When you want real user data to validate a release before full rollout. Also good when you cannot afford double infrastructure. Ideal for large-scale consumer products (social media, e-commerce) where even 1% of users is a meaningful test sample.

Real-world example: Netflix uses canary deployments heavily. When releasing a new recommendation algorithm, they first roll it out to 1% of users. Internal dashboards track engagement, stream errors, and watch time. Only if those metrics match or improve on the baseline does the rollout continue to 5%, 25%, 100%.

Rolling

Replace old instances with new instances one by one (or in small batches). At any point during the deployment, some instances run v1 and some run v2.

Start:   [v1] [v1] [v1] [v1] [v1]

Step 1:  [v2] [v1] [v1] [v1] [v1]
Step 2:  [v2] [v2] [v1] [v1] [v1]
Step 3:  [v2] [v2] [v2] [v1] [v1]
Step 4:  [v2] [v2] [v2] [v2] [v1]
Step 5:  [v2] [v2] [v2] [v2] [v2]

Advantages:

No duplicate infrastructure needed
Faster than blue-green (no environment build-up phase)
Gradual rollout gives time to catch issues

Disadvantages:

Both v1 and v2 serve traffic simultaneously — database schema changes are risky
Rollback is slow (you have to roll all instances back one by one)
No environment isolation between old and new versions

Trade-offs: Lower infrastructure cost than blue-green, but rollback is painful compared to blue-green’s instant switch.

When to choose: When you have limited infrastructure budget and the new version is backward-compatible with the current one. Good for stateless applications where running two versions simultaneously is safe.

Real-world example: Kubernetes rolling updates work this way by default. A kubectl set image command replaces pods one by one with the new image while keeping the service online throughout.

Shadow

The new version (v2) runs alongside the old version (v1), but receives a copy of real production traffic — without actually responding to users. v1 still handles all real responses. v2’s responses are discarded (or compared internally).

User request
     │
     ├──► v1 (active) → real response sent to user
     │
     └──► v2 (shadow) → response discarded / compared internally

Advantages:

v2 gets tested under real production load with zero risk to users
Great for catching performance bottlenecks before going live
No user impact at all if v2 breaks

Disadvantages:

Complex to set up — requires traffic mirroring infrastructure
Side effects are dangerous — if v2 writes to a database, processes a payment, sends an email, those are real actions, not test actions
Requires a mocking layer for any third-party service calls to prevent double actions
High infrastructure cost (running two full stacks)

Trade-offs: Maximum safety for testing, maximum complexity and cost to operate. Only justifiable when correctness is critical and you have engineering resources to build the mocking layer.

When to choose: For critical services where you must validate behaviour under real load before any user exposure. Common in payment systems, fraud detection, and machine learning model replacements.

Real-world example: A bank replacing their fraud detection engine. v2 (new ML model) receives a mirror of all real transactions but v1 makes all the actual approve/deny decisions. Engineers compare v1 and v2 outputs for weeks before switching live traffic to v2.

A/B Testing (A/B Deployment)

Split users into two groups — Group A gets v1 (control), Group B gets v2 (experiment). Collect data on user behaviour metrics. Make a business decision based on the data.

Users
  │
  ├── Group A (50%) ──► v1 (control)     ┐
  │                                      ├── Compare: conversion rate,
  └── Group B (50%) ──► v2 (experiment)  ┘   engagement, revenue

Advantages:

Data-driven decision making — you don’t guess, you measure
Tests hypotheses about user behaviour before full rollout
Groups can be defined by any user attribute (geography, device, account age)

Disadvantages:

Requires analytics infrastructure to collect and compare metrics meaningfully
Statistical significance takes time — you need enough users in both groups
Not suitable for bug fixes (you don’t A/B test a crash fix)
Managing state and sessions across two versions adds complexity

Trade-offs: A/B is a product decision tool, not purely a deployment tool. It answers “which version performs better with users” rather than “is this version safe to run.”

When to choose: For UI/UX changes, new features, pricing changes, or algorithm updates where user behaviour is the success metric. Not the right choice when correctness and stability are the primary concerns.

Real-world example: Spotify A/B tests their homepage layout. Group A sees the current layout, Group B sees a new layout with a redesigned search bar. After two weeks, Spotify’s data team compares search usage rates, listening session lengths, and playlist creation rates between groups. If Group B metrics are better, the new layout rolls out to everyone.

Recreate (No Zero Downtime)

Shut down the old version entirely, then start the new version. Simple, but users experience downtime in between.

v1 running
    │
    ▼
v1 shut down ← DOWNTIME STARTS HERE
    │
    ▼
v2 starts up ← DOWNTIME ENDS HERE
    │
    ▼
v2 running

Advantages:

Extremely simple to implement — no special tooling needed
Application state is completely fresh — no stale state from v1

Disadvantages:

Guaranteed downtime. Duration = shutdown time + startup time of new version.
Users are actively interrupted mid-session

When to choose: Only for non-production environments (dev, staging), batch jobs with no live users, or internal tools where brief downtime is acceptable.

7. Blue-Green Deployment — Deep Dive

What it is

Two identical production environments run side by side — blue (current live version) and green (new version being prepared). A load balancer sits in front of both and controls which environment receives user traffic.

At any given moment, only one environment is live. The other is idle — available as an instant rollback target.

Prerequisites before you can use blue-green

Two identical infrastructure environments must exist (same server specs, same config, same dependencies).
Both environments must be able to connect to the same (or a compatible) database — the new version’s schema changes must be backward-compatible with the old version
A load balancer or router that can switch traffic between the two environments with no DNS change required

How it works — the 4 phases

Phase 1: Setup

Load Balancer
     │
     ▼
  BLUE (v1.1) ← 100% of traffic, live
  GREEN (v1.2) ← idle, being prepared

Deploy v1.2 to the green environment. It is running but receives no user traffic.

Phase 2: Switch

Load Balancer
     │
     ├──► BLUE (v1.1) ← 0% traffic, kept on standby
     │
     └──► GREEN (v1.2) ← 100% traffic, now live

The load balancer redirects all traffic from blue to green. This switch is instantaneous. Most users never notice — the DNS record does not change. The load balancer simply changes its routing target.

Phase 3: Monitor

DevOps engineers immediately run smoke tests on the live green environment:

Are APIs responding?
Are error rates elevated?
Are response times normal?
Are background jobs running?

This is the critical window. The blue environment is still warm and ready for instant rollback.

Phase 4: Rollback or Retire

If green has problems:
  Load balancer switches back to BLUE in seconds
  Users are on v1.1 again, unaware anything happened

If green is healthy after monitoring period:
  Blue is retired (or becomes the new idle environment for next release)
  Green becomes the new blue
  A new green environment is prepared for the next release

Benefits

Seamless user experience — traffic switch is instantaneous. No user sees a loading spinner or error page during deployment.

Instant rollback — if something goes wrong, switching back to blue takes seconds. This is the fastest rollback of any deployment strategy.

No upgrade scheduling — no maintenance windows, no 2am deployments to avoid peak traffic. Deploy any time.

Testing in production parity — the green environment is identical to production. You are not testing in a simulated environment — you are testing the exact thing that will serve users.

Disaster recovery practice built-in — because switching between two live environments is a regular operation, teams become practiced at it. The same mechanism works for disaster recovery.

Challenges

High infrastructure cost — you permanently maintain double the infrastructure. If your application runs on 20 servers, blue-green means 40 servers. On cloud infrastructure with elastic scaling, this cost can be partially managed by spinning down the idle environment between releases, but the standby environment must come up quickly when needed.

Database schema changes are hard — this is the biggest practical challenge. If v1.2 adds a new database column, the old v1.1 code (still running in blue as a standby) may not understand that column. If you need to roll back to blue, the database already has the new column. Strategy: make all database changes backward-compatible — add columns but never remove or rename them until the old version is fully retired.

Session and user routing issues — if a user has an active session in blue and traffic switches to green, their session state may not exist in green. They get logged out or mid-transaction requests fail. Solutions:

Use a shared session store (Redis) accessible by both environments
Configure the load balancer to drain existing connections gracefully before completing the switch (allows existing sessions to finish on blue while new sessions go to green)
Accept that a small number of in-flight requests will fail during the switch

Instantaneous switch:
  All users immediately moved to green
  In-flight requests on blue → fail or require re-login
  Simple but some users impacted

Graceful drain:
  New connections → green
  Existing connections → finish on blue, then close
  Slower but zero disruption to active users

Code compatibility — because blue and green run simultaneously and both connect to the same production database, every release must be compatible with the current database state. You cannot deploy a version that requires a schema that does not yet exist in the database.

Real-world example — E-commerce site

An online retailer runs their checkout service on blue (v2.3). They have built v2.4 with a faster payment processing flow.

Before deployment:
  Load Balancer → BLUE (v2.3) ← all 50,000 concurrent users
  GREEN (v2.4) ← deployed, running, zero traffic

Release day (not a night, not a weekend — any time):
  1. Team deploys v2.4 to green environment
  2. Team runs automated smoke tests against green directly
     (test checkout flow, payment processing, order confirmation)
  3. All smoke tests pass
  4. Load balancer switches: 100% traffic → GREEN (v2.4)
  5. Team monitors dashboards for 30 minutes:
     - Payment success rate: 99.7% (same as before)
     - Page load time: 340ms (down from 480ms — improvement confirmed)
     - Error rate: 0.01% (normal baseline)
  6. Blue (v2.3) remains on standby for 2 hours
  7. No issues — blue is retired
  8. Green is now called blue
  9. A new green environment is prepared for v2.5

If at step 5 the payment success rate dropped to 94%, the team would switch the load balancer back to blue (v2.3) in under 10 seconds. Users would barely notice.

8. Choosing the Right Strategy

SITUATION                                          RECOMMENDED STRATEGY
─────────────────────────────────────────          ────────────────────
Mission-critical service, instant rollback a       Blue-Green
must, can afford double infrastructure

Large user base, want real data before             Canary
full rollout, limited infrastructure budget

Kubernetes-native, stateless app,                  Rolling
backward-compatible release

Testing behaviour under production load,           Shadow
cannot risk any user impact, critical service

Need to measure user behaviour impact              A/B Testing
of a product or UI change

Internal tool, dev environment, batch job,         Recreate
downtime is acceptable

The real-world combination

Most mature organizations do not pick one strategy exclusively. A common pattern:

1. Shadow  →  validate new ML model with real traffic before exposing to users
2. Canary  →  release to 5% of users, monitor for 24 hours
3. Rolling →  gradually replace remaining 95% of instances
4. Blue-Green maintained as the emergency rollback mechanism throughout

The deployment strategy is a trade-off between:

SAFETY          ←────────────────────────────────────────────► SPEED
Shadow          Blue-Green       Canary          Rolling        Recreate

COST            ←────────────────────────────────────────────► SAVINGS
Blue-Green      Shadow           Canary          Rolling        Recreate

ROLLBACK SPEED  ←────────────────────────────────────────────► SLOW
Blue-Green      Canary           Shadow          Rolling        Recreate

No strategy wins on all dimensions. The right choice depends on the specific risk tolerance, infrastructure budget, and user experience requirements of your organization.

Continuous Deployment

Table of Contents

1. Continuous Delivery vs Continuous Deployment

2. Advantages & Disadvantages

Advantages

Disadvantages

3. Best Practices

Test-Driven Development (TDD)

One Deployment Path Only

Containerization

4. CD Tools Overview

What a CD tool needs to do

DeployBot

AWS CodeDeploy

Octopus Deploy

Argo CD

5. Zero-Downtime Deployment

6. Deployment Strategies

Blue-Green

Canary

Rolling

Shadow

A/B Testing (A/B Deployment)

Recreate (No Zero Downtime)

7. Blue-Green Deployment — Deep Dive

What it is

Prerequisites before you can use blue-green

How it works — the 4 phases

Benefits

Challenges

Real-world example — E-commerce site

8. Choosing the Right Strategy

The real-world combination