Continuous Deployment
Table of Contents
Section titled “Table of Contents”- Continuous Delivery vs Continuous Deployment
- Advantages & Disadvantages
- Best Practices
- CD Tools Overview
- Zero-Downtime Deployment
- Deployment Strategies
- Blue-Green Deployment — Deep Dive
- Choosing the Right Strategy
1. Continuous Delivery vs Continuous Deployment
Section titled “1. Continuous Delivery vs Continuous Deployment”Both are called “CD” — here is the exact difference:
CODE COMMITTED │ ▼ CI runs (build, test, lint) │ ▼ Artifact ready for release │ ├── Continuous DELIVERY │ Manual approval required │ A human clicks "deploy to prod" │ └── Continuous DEPLOYMENT No human in the loop Automatically goes to production if all tests passSimple analogy: Think of ordering a package online.
- Delivery = the package arrives at your door. You still have to open it and decide if you want to keep it.
- Deployment = the package arrives, a robot opens it, verifies the contents, and puts everything in place automatically. You just come home to a working product.
Continuous Deployment is the complete automation of the entire path from commit → production. No manual gates after the initial pipeline.
2. Advantages & Disadvantages
Section titled “2. Advantages & Disadvantages”Advantages
Section titled “Advantages”Speed to market — features go from a developer’s machine to customers’ hands within minutes of a commit passing tests. No waiting for release windows.
Real-time feedback loop — if a customer reports a bug on Monday, a fix can be deployed by Monday afternoon. Teams respond to the market in hours instead of weeks.
Smaller, safer releases — since every commit deploys individually, each release is tiny. A bug from one small change is far easier to isolate and fix than a bug introduced somewhere in a 200-commit quarterly release.
No “release day” stress — release is not an event anymore. It is just a normal pipeline run. Teams stop dreading deployments.
Disadvantages
Section titled “Disadvantages”High upfront engineering cost — building a proper CD pipeline (with automated tests, monitoring, rollback capability, staging environments) takes significant time and investment before you see returns.
Ongoing maintenance — the pipeline itself needs to be maintained. Flaky tests, infrastructure drift, new services, and dependency updates all require pipeline work.
Requires mature test coverage — if your test suite has gaps, bad code silently reaches production. CD amplifies both good testing (fast shipping) and bad testing (fast bugs in prod).
Not suitable for every team — regulated industries (banking, healthcare) often require human sign-off before production releases. Full CD may be blocked by compliance requirements.
3. Best Practices
Section titled “3. Best Practices”Test-Driven Development (TDD)
Section titled “Test-Driven Development (TDD)”Write the test before writing the code. This ensures every new feature has automated coverage before it can enter the CD pipeline. Without this, gaps accumulate — code that “works” but has no test to catch regressions later.
Traditional: write code → maybe write tests → deployTDD: write test (fails) → write code (test passes) → deployThe CD pipeline gate is the test suite. Its quality determines the safety of the entire system.
One Deployment Path Only
Section titled “One Deployment Path Only”Once a CD pipeline is in place, it is the only way to deploy. No exceptions:
- No SSH-ing into production to “quickly fix one line”
- No copying files manually to a server
- No live-editing config files on prod
Every manual change outside the pipeline breaks the deployment history — the record of what is running in production becomes inaccurate. The next automated deployment may then conflict with the manual change in unpredictable ways.
Containerization
Section titled “Containerization”Package applications in Docker containers as part of the pipeline. This eliminates “works on my machine” problems because the container carries the exact runtime environment with it.
Without containers: Works on dev laptop → fails on staging → "but it worked for me"
With containers: Built in pipeline → same image runs on staging → same image runs on prod Identical behaviour at every stage4. CD Tools Overview
Section titled “4. CD Tools Overview”What a CD tool needs to do
Section titled “What a CD tool needs to do”Any proper CD tool must handle three things:
1. Automated testing → gate bad code before it reaches production2. Rolling deployments → activate new code in a live environment without downtime3. Monitoring & alerts → know when something goes wrong and trigger a rollbackDeployBot
Section titled “DeployBot”Simple, focused deployment tool. Best for teams that just want automated deploys without heavy CI/CD overhead.
Key capabilities:
- Deploy to multiple servers simultaneously from different branches
- Run shell scripts before, during, or after a deployment
- Real-time deployment progress tracking in the UI
- One-click rollback to a previous release
Best for: small to mid-size teams wanting quick setup.
AWS CodeDeploy
Section titled “AWS CodeDeploy”AWS’s managed deployment service. No servers to maintain — AWS handles the deployment infrastructure.
Key capabilities:
- Deploys to EC2 instances, AWS Fargate, Lambda, and on-premises servers
- Keeps a full deployment history with timeline tracking
- Centralized control through AWS Console, CLI, SDK, or API
- Integrates directly with CodePipeline for full CI/CD within AWS
Best for: teams already on AWS who want deployment tightly integrated with their cloud infrastructure.
Octopus Deploy
Section titled “Octopus Deploy”Deployment automation server focused on complex multi-environment, multi-platform releases.
Key capabilities:
- Supports .NET, Java, and other platforms with high-level deployment steps
- Role-based deployment approvals (restrict who can deploy to production)
- Schedule deployments for specific windows
- Manages sensitive variables and secrets across environments
- Can run on-premises or cloud
Best for: enterprise teams with complex deployment orchestration needs across many environments.
Argo CD
Section titled “Argo CD”GitOps-based CD tool specifically for Kubernetes. The Git repository is the single source of truth — whatever is in Git is what should be running in the cluster.
Key capabilities:
- Watches a Git repo and automatically syncs the cluster to match
- Supports Helm, Kustomize, Ksonnet, Jsonnet, plain YAML
- Manages deployments across multiple Kubernetes clusters
- Web UI shows live application state vs desired state
- Supports blue-green and canary deployments via sync hooks
- Full audit trail of every deployment and API call
- One-command rollback to any previous Git commit
Best for: teams running Kubernetes who want GitOps-style deployments.
5. Zero-Downtime Deployment
Section titled “5. Zero-Downtime Deployment”An hour of downtime for an online service can cost thousands to millions of dollars in lost revenue, user trust, and SLA penalties. Zero-downtime deployment means users never experience an interruption while a new version is being released.
The core requirement: at no point during the deployment should the service be completely unavailable.
This is achieved through deployment strategies that:
- Keep the old version running while the new one starts up
- Switch traffic gradually or instantly to the new version
- Maintain rollback capability throughout
6. Deployment Strategies
Section titled “6. Deployment Strategies”Blue-Green
Section titled “Blue-Green”Two identical environments always exist. One is live (blue), one is idle (green). Deploy the new version to green, test it, then switch all traffic from blue to green. Blue stays as instant rollback.
→ See full deep dive in Section 7
Canary
Section titled “Canary”Release the new version to a small percentage of users first — 5%, 10%. Watch metrics. Gradually increase traffic to the new version if everything looks good. Roll back only the affected users if something breaks.
Initial: 100% → v1
After deploy: 95% → v1 (old) 5% → v2 (canary)
If metrics ok: 80% → v1 20% → v2
If still ok: 0% → v1 100% → v2Advantages:
- Real production traffic tests the new version before full rollout
- Failures affect only a small percentage of users
- No need for duplicate infrastructure (unlike blue-green)
- Gradual confidence building before full commit
Disadvantages:
- Two versions run simultaneously — both must be compatible with the same database schema
- Rollback is more complex than blue-green (not a single switch)
- Monitoring and traffic splitting adds infrastructure complexity
- Slower to fully release than blue-green
Trade-offs: Less infrastructure cost than blue-green, but more operational complexity. The blast radius of a bad deploy is smaller.
When to choose: When you want real user data to validate a release before full rollout. Also good when you cannot afford double infrastructure. Ideal for large-scale consumer products (social media, e-commerce) where even 1% of users is a meaningful test sample.
Real-world example: Netflix uses canary deployments heavily. When releasing a new recommendation algorithm, they first roll it out to 1% of users. Internal dashboards track engagement, stream errors, and watch time. Only if those metrics match or improve on the baseline does the rollout continue to 5%, 25%, 100%.
Rolling
Section titled “Rolling”Replace old instances with new instances one by one (or in small batches). At any point during the deployment, some instances run v1 and some run v2.
Start: [v1] [v1] [v1] [v1] [v1]
Step 1: [v2] [v1] [v1] [v1] [v1]Step 2: [v2] [v2] [v1] [v1] [v1]Step 3: [v2] [v2] [v2] [v1] [v1]Step 4: [v2] [v2] [v2] [v2] [v1]Step 5: [v2] [v2] [v2] [v2] [v2]Advantages:
- No duplicate infrastructure needed
- Faster than blue-green (no environment build-up phase)
- Gradual rollout gives time to catch issues
Disadvantages:
- Both v1 and v2 serve traffic simultaneously — database schema changes are risky
- Rollback is slow (you have to roll all instances back one by one)
- No environment isolation between old and new versions
Trade-offs: Lower infrastructure cost than blue-green, but rollback is painful compared to blue-green’s instant switch.
When to choose: When you have limited infrastructure budget and the new version is backward-compatible with the current one. Good for stateless applications where running two versions simultaneously is safe.
Real-world example: Kubernetes rolling updates work this way by default. A kubectl set image command replaces pods one by one with the new image while keeping the service online throughout.
Shadow
Section titled “Shadow”The new version (v2) runs alongside the old version (v1), but receives a copy of real production traffic — without actually responding to users. v1 still handles all real responses. v2’s responses are discarded (or compared internally).
User request │ ├──► v1 (active) → real response sent to user │ └──► v2 (shadow) → response discarded / compared internallyAdvantages:
- v2 gets tested under real production load with zero risk to users
- Great for catching performance bottlenecks before going live
- No user impact at all if v2 breaks
Disadvantages:
- Complex to set up — requires traffic mirroring infrastructure
- Side effects are dangerous — if v2 writes to a database, processes a payment, sends an email, those are real actions, not test actions
- Requires a mocking layer for any third-party service calls to prevent double actions
- High infrastructure cost (running two full stacks)
Trade-offs: Maximum safety for testing, maximum complexity and cost to operate. Only justifiable when correctness is critical and you have engineering resources to build the mocking layer.
When to choose: For critical services where you must validate behaviour under real load before any user exposure. Common in payment systems, fraud detection, and machine learning model replacements.
Real-world example: A bank replacing their fraud detection engine. v2 (new ML model) receives a mirror of all real transactions but v1 makes all the actual approve/deny decisions. Engineers compare v1 and v2 outputs for weeks before switching live traffic to v2.
A/B Testing (A/B Deployment)
Section titled “A/B Testing (A/B Deployment)”Split users into two groups — Group A gets v1 (control), Group B gets v2 (experiment). Collect data on user behaviour metrics. Make a business decision based on the data.
Users │ ├── Group A (50%) ──► v1 (control) ┐ │ ├── Compare: conversion rate, └── Group B (50%) ──► v2 (experiment) ┘ engagement, revenueAdvantages:
- Data-driven decision making — you don’t guess, you measure
- Tests hypotheses about user behaviour before full rollout
- Groups can be defined by any user attribute (geography, device, account age)
Disadvantages:
- Requires analytics infrastructure to collect and compare metrics meaningfully
- Statistical significance takes time — you need enough users in both groups
- Not suitable for bug fixes (you don’t A/B test a crash fix)
- Managing state and sessions across two versions adds complexity
Trade-offs: A/B is a product decision tool, not purely a deployment tool. It answers “which version performs better with users” rather than “is this version safe to run.”
When to choose: For UI/UX changes, new features, pricing changes, or algorithm updates where user behaviour is the success metric. Not the right choice when correctness and stability are the primary concerns.
Real-world example: Spotify A/B tests their homepage layout. Group A sees the current layout, Group B sees a new layout with a redesigned search bar. After two weeks, Spotify’s data team compares search usage rates, listening session lengths, and playlist creation rates between groups. If Group B metrics are better, the new layout rolls out to everyone.
Recreate (No Zero Downtime)
Section titled “Recreate (No Zero Downtime)”Shut down the old version entirely, then start the new version. Simple, but users experience downtime in between.
v1 running │ ▼v1 shut down ← DOWNTIME STARTS HERE │ ▼v2 starts up ← DOWNTIME ENDS HERE │ ▼v2 runningAdvantages:
- Extremely simple to implement — no special tooling needed
- Application state is completely fresh — no stale state from v1
Disadvantages:
- Guaranteed downtime. Duration = shutdown time + startup time of new version.
- Users are actively interrupted mid-session
When to choose: Only for non-production environments (dev, staging), batch jobs with no live users, or internal tools where brief downtime is acceptable.
7. Blue-Green Deployment — Deep Dive
Section titled “7. Blue-Green Deployment — Deep Dive”What it is
Section titled “What it is”Two identical production environments run side by side — blue (current live version) and green (new version being prepared). A load balancer sits in front of both and controls which environment receives user traffic.
At any given moment, only one environment is live. The other is idle — available as an instant rollback target.
Prerequisites before you can use blue-green
Section titled “Prerequisites before you can use blue-green”- Two identical infrastructure environments must exist (same server specs, same config, same dependencies).
- Both environments must be able to connect to the same (or a compatible) database — the new version’s schema changes must be backward-compatible with the old version
- A load balancer or router that can switch traffic between the two environments with no DNS change required
How it works — the 4 phases
Section titled “How it works — the 4 phases”Phase 1: Setup
Load Balancer │ ▼ BLUE (v1.1) ← 100% of traffic, live GREEN (v1.2) ← idle, being preparedDeploy v1.2 to the green environment. It is running but receives no user traffic.
Phase 2: Switch
Load Balancer │ ├──► BLUE (v1.1) ← 0% traffic, kept on standby │ └──► GREEN (v1.2) ← 100% traffic, now liveThe load balancer redirects all traffic from blue to green. This switch is instantaneous. Most users never notice — the DNS record does not change. The load balancer simply changes its routing target.
Phase 3: Monitor
DevOps engineers immediately run smoke tests on the live green environment:
- Are APIs responding?
- Are error rates elevated?
- Are response times normal?
- Are background jobs running?
This is the critical window. The blue environment is still warm and ready for instant rollback.
Phase 4: Rollback or Retire
If green has problems: Load balancer switches back to BLUE in seconds Users are on v1.1 again, unaware anything happened
If green is healthy after monitoring period: Blue is retired (or becomes the new idle environment for next release) Green becomes the new blue A new green environment is prepared for the next releaseBenefits
Section titled “Benefits”Seamless user experience — traffic switch is instantaneous. No user sees a loading spinner or error page during deployment.
Instant rollback — if something goes wrong, switching back to blue takes seconds. This is the fastest rollback of any deployment strategy.
No upgrade scheduling — no maintenance windows, no 2am deployments to avoid peak traffic. Deploy any time.
Testing in production parity — the green environment is identical to production. You are not testing in a simulated environment — you are testing the exact thing that will serve users.
Disaster recovery practice built-in — because switching between two live environments is a regular operation, teams become practiced at it. The same mechanism works for disaster recovery.
Challenges
Section titled “Challenges”High infrastructure cost — you permanently maintain double the infrastructure. If your application runs on 20 servers, blue-green means 40 servers. On cloud infrastructure with elastic scaling, this cost can be partially managed by spinning down the idle environment between releases, but the standby environment must come up quickly when needed.
Database schema changes are hard — this is the biggest practical challenge. If v1.2 adds a new database column, the old v1.1 code (still running in blue as a standby) may not understand that column. If you need to roll back to blue, the database already has the new column. Strategy: make all database changes backward-compatible — add columns but never remove or rename them until the old version is fully retired.
Session and user routing issues — if a user has an active session in blue and traffic switches to green, their session state may not exist in green. They get logged out or mid-transaction requests fail. Solutions:
- Use a shared session store (Redis) accessible by both environments
- Configure the load balancer to drain existing connections gracefully before completing the switch (allows existing sessions to finish on blue while new sessions go to green)
- Accept that a small number of in-flight requests will fail during the switch
Instantaneous switch: All users immediately moved to green In-flight requests on blue → fail or require re-login Simple but some users impacted
Graceful drain: New connections → green Existing connections → finish on blue, then close Slower but zero disruption to active usersCode compatibility — because blue and green run simultaneously and both connect to the same production database, every release must be compatible with the current database state. You cannot deploy a version that requires a schema that does not yet exist in the database.
Real-world example — E-commerce site
Section titled “Real-world example — E-commerce site”An online retailer runs their checkout service on blue (v2.3). They have built v2.4 with a faster payment processing flow.
Before deployment: Load Balancer → BLUE (v2.3) ← all 50,000 concurrent users GREEN (v2.4) ← deployed, running, zero traffic
Release day (not a night, not a weekend — any time): 1. Team deploys v2.4 to green environment 2. Team runs automated smoke tests against green directly (test checkout flow, payment processing, order confirmation) 3. All smoke tests pass 4. Load balancer switches: 100% traffic → GREEN (v2.4) 5. Team monitors dashboards for 30 minutes: - Payment success rate: 99.7% (same as before) - Page load time: 340ms (down from 480ms — improvement confirmed) - Error rate: 0.01% (normal baseline) 6. Blue (v2.3) remains on standby for 2 hours 7. No issues — blue is retired 8. Green is now called blue 9. A new green environment is prepared for v2.5If at step 5 the payment success rate dropped to 94%, the team would switch the load balancer back to blue (v2.3) in under 10 seconds. Users would barely notice.
8. Choosing the Right Strategy
Section titled “8. Choosing the Right Strategy”SITUATION RECOMMENDED STRATEGY───────────────────────────────────────── ────────────────────Mission-critical service, instant rollback a Blue-Greenmust, can afford double infrastructure
Large user base, want real data before Canaryfull rollout, limited infrastructure budget
Kubernetes-native, stateless app, Rollingbackward-compatible release
Testing behaviour under production load, Shadowcannot risk any user impact, critical service
Need to measure user behaviour impact A/B Testingof a product or UI change
Internal tool, dev environment, batch job, Recreatedowntime is acceptableThe real-world combination
Section titled “The real-world combination”Most mature organizations do not pick one strategy exclusively. A common pattern:
1. Shadow → validate new ML model with real traffic before exposing to users2. Canary → release to 5% of users, monitor for 24 hours3. Rolling → gradually replace remaining 95% of instances4. Blue-Green maintained as the emergency rollback mechanism throughoutThe deployment strategy is a trade-off between:
SAFETY ←────────────────────────────────────────────► SPEEDShadow Blue-Green Canary Rolling Recreate
COST ←────────────────────────────────────────────► SAVINGSBlue-Green Shadow Canary Rolling Recreate
ROLLBACK SPEED ←────────────────────────────────────────────► SLOWBlue-Green Canary Shadow Rolling RecreateNo strategy wins on all dimensions. The right choice depends on the specific risk tolerance, infrastructure budget, and user experience requirements of your organization.