Part 3 NAT Gateway Deep Dive
Our Example Architecture (Reference This Throughout)
Section titled “Our Example Architecture (Reference This Throughout)”Before anything else, here is the setup we will use for every example, diagram, and packet trace in this part.
VPC: my-vpc — 10.0.0.0/24Region: us-east-1
| Subnet Name | CIDR | AZ | Type |
|---|---|---|---|
| public-subnet | 10.0.0.0/25 | us-east-1a | Public |
| private-subnet | 10.0.0.128/25 | us-east-1a | Private |
Key resources:
NAT Gateway: nat-gw → lives in public-subnet → EIP: 54.10.10.10Internet Gateway: my-igw → attached to my-vpc
EC2 Instance (public): public-ec2 → public-subnet → private IP: 10.0.0.10EC2 Instance (private): private-ec2 → private-subnet → private IP: 10.0.0.140Table of Contents
Section titled “Table of Contents”- Why NAT Exists — The Core Problem
- What NAT Actually Does — PAT Explained
- The Full Packet Journey — Step by Step
- Public NAT Gateway vs Private NAT Gateway
- Elastic IP and NAT Gateway
- Placement Rule — Why NAT Gateway Must Be in a Public Subnet
- High Availability — One NAT Gateway Per AZ
- NAT Gateway vs NAT Instance
- Technical Limits and Performance
- Connection Tracking and Idle Timeouts
- NAT Gateway vs Internet Gateway — The Complete Comparison
- When NOT to Use NAT Gateway — VPC Endpoints
- Cost Breakdown
- NACLs and NAT Gateway — the Ephemeral Port Problem
- Common Mistakes and Gotchas
- Quick Reference Cheatsheet
1. Why NAT Exists — The Core Problem
Section titled “1. Why NAT Exists — The Core Problem”IPv4 exhaustion
Section titled “IPv4 exhaustion”The internet runs on IPv4 addresses. There are exactly 2^32 = 4,294,967,296 possible IPv4 addresses — about 4.3 billion. The internet has far more than 4.3 billion connected devices. We ran out of assignable public IPv4 addresses around 2011.
The solution that kept the internet running was NAT — Network Address Translation. NAT allows many devices that each have a private IP address to share a single public IP address when communicating with the internet. Instead of every device needing its own public IP, an entire office building, data center, or VPC can share one (or a handful of) public IPs.
WITHOUT NAT: Device A needs public IP: 54.10.10.1 Device B needs public IP: 54.10.10.2 Device C needs public IP: 54.10.10.3 ...thousands of devices, thousands of public IPs consumed
WITH NAT: Device A (private: 10.0.0.140) ─┐ Device B (private: 10.0.0.141) ─┼─→ NAT → public IP: 54.10.10.10 → Internet Device C (private: 10.0.0.142) ─┘
All three share one public IP. Only one public IP consumed.The AWS context
Section titled “The AWS context”In a VPC, every instance in a private subnet has a private IP. Private IPs (like 10.0.x.x) are not routable on the internet — if a private-subnet instance tried to send a packet directly to 8.8.8.8 with source address 10.0.0.140 (a private IP), every router on the internet would discard it because 10.0.0.140 is not a valid internet address, and responses would have nowhere to go.
But private instances still need outbound internet access for legitimate purposes:
- Downloading OS patches (
yum update -y,apt-get upgrade) - Pulling container images from Docker Hub or ECR
- Calling third-party APIs (payment gateways, auth providers)
- Fetching configuration from public endpoints
- Sending logs or metrics to external services
The requirement is asymmetric:
- Outbound to internet: YES — instances must be able to initiate connections
- Inbound from internet: NO — the internet must never be able to initiate connections to private instances
NAT Gateway solves exactly this. It is the one-way valve — lets your private instances call out to the internet but drops any unsolicited connection coming in.
2. What NAT Actually Does — PAT Explained
Section titled “2. What NAT Actually Does — PAT Explained”Most people say “NAT” but what NAT Gateway actually performs is PAT — Port Address Translation, also called NAT overloading or NAPT (Network Address Port Translation).
Basic NAT vs PAT
Section titled “Basic NAT vs PAT”Basic NAT is a simple one-to-one mapping: one private IP maps to one public IP. If you have 100 private instances, you need 100 public IPs. This doesn’t solve the exhaustion problem at all.
PAT is a many-to-one mapping: many private IPs share one public IP by using different source ports to keep conversations separate. One public IP can serve tens of thousands of concurrent connections simultaneously.
How PAT works
Section titled “How PAT works”Every TCP/UDP connection is uniquely identified by a 5-tuple:
(source IP, source port, destination IP, destination port, protocol)When private-ec2 (10.0.0.140) makes an HTTPS request to Google (142.250.80.46 port 443):
BEFORE PAT (packet leaving private-ec2): Source IP: 10.0.0.140 Source Port: 52341 ← randomly chosen ephemeral port by the OS Dest IP: 142.250.80.46 Dest Port: 443
NAT Gateway receives this and creates a translation record: PRIVATE SIDE PUBLIC SIDE 10.0.0.140 : 52341 ←→ 54.10.10.10 : 58901
AFTER PAT (packet leaving NAT Gateway toward internet): Source IP: 54.10.10.10 ← NAT Gateway's EIP replaces private IP Source Port: 58901 ← NAT Gateway chooses a new source port Dest IP: 142.250.80.46 Dest Port: 443When Google’s response comes back:
Google responds to 54.10.10.10 port 58901 Dest IP: 54.10.10.10 Dest Port: 58901
NAT Gateway looks up its translation table: 54.10.10.10 : 58901 → 10.0.0.140 : 52341
NAT Gateway rewrites the packet: Dest IP: 10.0.0.140 Dest Port: 52341
Delivers it to private-ec2 — the response arrives correctly.The private instance never knows the translation happened. From its perspective it sent a request to Google and got a response back. The NAT Gateway’s translation table is the magic in the middle.
The translation table — keeping conversations separate
Section titled “The translation table — keeping conversations separate”Now suppose both private-ec2 (10.0.0.140) and a second private instance (10.0.0.141) make requests to Google simultaneously. The NAT Gateway handles this by using different source ports:
TRANSLATION TABLE (simplified):
| Private IP | Priv Port | EIP | Pub Port | Destination |
|---|---|---|---|---|
| 10.0.0.140 | 52341 | 54.10.10.10 | 58901 | 142.250.80.46 : 443 |
| 10.0.0.141 | 44217 | 54.10.10.10 | 59044 | 142.250.80.46 : 443 |
| 10.0.0.140 | 52399 | 54.10.10.10 | 58955 | 1.1.1.1 : 53 |
Each row is a unique conversation. When a response comes back to 54.10.10.10 : 59044, the NAT Gateway knows it belongs to 10.0.0.141 : 44217. The destination host never knows there are multiple private IPs — it just sees all traffic coming from 54.10.10.10.
3. The Full Packet Journey — Step by Step
Section titled “3. The Full Packet Journey — Step by Step”Let’s trace a single HTTPS request from private-ec2 to 8.8.8.8 (Google DNS) through the complete stack.
ARCHITECTURE OVERVIEW:
10.0.0.140 10.0.0.5 54.10.10.10(private-ec2) (nat-gw) (EIP) Internet │ │ │ │ │ │ │ │private-subnet public-subnet │ IGW(10.0.0.128/25) (10.0.0.0/25) │ (my-igw)Step 1 — Application initiates a request
private-ec2 wants to reach 8.8.8.8 (Google DNS, port 53 UDP)
OS creates the packet: Source: 10.0.0.140 : 54321 Destination: 8.8.8.8 : 53 Protocol: UDPStep 2 — Instance consults its routing table
private-subnet Route Table: Destination Target ───────────────── ─────────────── 10.0.0.0/24 local ← 8.8.8.8 doesn't match this 0.0.0.0/0 nat-gw ← 8.8.8.8 matches this
Result: Send to nat-gwStep 3 — Packet arrives at NAT Gateway
NAT Gateway receives: Source: 10.0.0.140 : 54321 Destination: 8.8.8.8 : 53
NAT Gateway creates a translation record: 10.0.0.140 : 54321 ←→ 54.10.10.10 : 61200
NAT Gateway rewrites packet: Source: 54.10.10.10 : 61200 ← EIP replaces private IP Destination: 8.8.8.8 : 53 ← unchangedStep 4 — NAT Gateway’s own routing
NAT Gateway (which lives in public-subnet) consults:public-subnet Route Table: Destination Target ───────────────── ─────────────── 10.0.0.0/24 local 0.0.0.0/0 my-igw ← 8.8.8.8 goes to IGW
Result: Send to my-igwStep 5 — Internet Gateway handles the packet
IGW receives: Source: 54.10.10.10 : 61200 Destination: 8.8.8.8 : 53
The source is already a public IP (the EIP). IGW sends this onto the internet.
Note: The IGW is stateless for this direction — it just forwards the packet.Step 6 — Response returns
8.8.8.8 responds to 54.10.10.10 : 61200: Source: 8.8.8.8 : 53 Destination: 54.10.10.10 : 61200
IGW receives → forwards to my-vpc (because 54.10.10.10 is the EIP of nat-gw)Step 7 — NAT Gateway reverses the translation
NAT Gateway receives the response: Source: 8.8.8.8 : 53 Destination: 54.10.10.10 : 61200
Looks up translation table: 54.10.10.10 : 61200 → 10.0.0.140 : 54321
Rewrites packet: Source: 8.8.8.8 : 53 ← unchanged Destination: 10.0.0.140 : 54321 ← restores original private IP
Sends to private-ec2 in private-subnetStep 8 — private-ec2 receives the response
DNS response arrives at private-ec2.The application gets its answer.private-ec2 never knew about the translation.Full picture:
private-ec2 nat-gw my-igw 8.8.8.8(10.0.0.140) (10.0.0.x) (edge of VPC) (internet) │ │ │ │ │─── UDP packet ────►│ │ │ │ src: 10.0.0.140 │ │ │ │ dst: 8.8.8.8:53 │ │ │ │ │ │ │ │ [PAT translate] │ │ │ 10.0.0.140:54321 │ │ │ → 54.10.10.10:61200 │ │ │ │ │ │ │ │─── translated ─────►│ │ │ │ src: 54.10.10.10 │ │ │ │ dst: 8.8.8.8:53 │ │ │ │ │─── to internet─►│ │ │ │ │ │ │ │◄── response ───│ │ │◄── response ───────│ │ │ │ dst: 54.10.10.10 │ │ │ │ │ │ │ [reverse lookup] │ │ │ 54.10.10.10:61200 │ │ │ → 10.0.0.140:54321 │ │ │ │ │ │ │◄── response ───────│ │ │ │ dst: 10.0.0.140 │ │ │ │ │ │ │4. Public NAT Gateway vs Private NAT Gateway
Section titled “4. Public NAT Gateway vs Private NAT Gateway”AWS offers two types of NAT Gateway. They serve fundamentally different use cases.
Public NAT Gateway
Section titled “Public NAT Gateway”This is what Part 2 covered. It gives private subnet instances outbound internet access.
Purpose: Private instances → InternetDirection: Outbound only (internet cannot initiate connections in)Lives in: Public subnet (one with a route to the Internet Gateway)Requires: Elastic IP addressTraffic path: Instance → NAT GW → IGW → InternetExample route tables for public NAT Gateway:
public-subnet Route Table: 10.0.0.0/24 → local 0.0.0.0/0 → my-igw ← NAT Gateway needs this to reach internet
private-subnet Route Table: 10.0.0.0/24 → local 0.0.0.0/0 → nat-gw ← instances send internet traffic hereWhen traffic from the private instance leaves the NAT Gateway through the IGW, the source IP seen by the destination is the EIP (54.10.10.10). This only happens when the traffic path includes the IGW in the same VPC.
Important nuance: If a public NAT Gateway routes traffic through a Transit Gateway or Virtual Private Gateway (instead of through an IGW), the source IP seen by the destination is the private IP of the NAT Gateway, not the EIP. The EIP is only visible to the outside world when the IGW performs its final translation.
Private NAT Gateway
Section titled “Private NAT Gateway”This is less talked about but critical for multi-VPC architectures.
Purpose: Connect private instances to OTHER VPCs or on-premises networks (not the internet)Direction: Outbound only (no unsolicited inbound)Lives in: Private subnet (does NOT need a public subnet or IGW)Requires: NO Elastic IP — uses only private IPsTraffic path: Instance → Private NAT GW → Transit Gateway → Other VPCWhen do you need a private NAT Gateway?
The classic problem it solves: overlapping CIDR ranges between VPCs.
VPC-A uses 10.0.0.0/16. VPC-B also uses 10.0.0.0/16. You cannot connect these with VPC Peering because the CIDRs overlap. But with private NAT, you can still route between them:
VPC-A (10.0.0.0/16) VPC-B (10.0.0.0/16)───────────────────── ─────────────────────routable-subnet: 100.64.0.0/28 ALB in routable-subnet │ │ Private NAT GW Transit Gateway (IP: 100.64.0.5) │ │ │ TGW ──────────────────────────────────────- VPC-A’s instance sends traffic to VPC-B
- Private NAT translates source IP from
10.0.x.xto100.64.0.5(routable range) - VPC-B sees traffic from
100.64.0.5— no overlap conflict - Response returns to
100.64.0.5, NAT Gateway reverses the translation
Key difference at a glance:
| Feature | Public NAT Gateway | Private NAT Gateway |
|---|---|---|
| Purpose | Internet access | VPC-to-VPC / on-prem |
| Elastic IP required | YES | NO |
| Subnet placement | Public subnet | Private subnet |
| Connects to | Internet Gateway | Transit GW / VPN GW |
| Source IP visible | EIP (when via IGW) | NAT GW private IP |
| Internet access | YES | NO (IGW drops traffic) |
| Use case | Software updates, APIs | Overlapping CIDRs,private connectivity |
Critical rule: If you attach an Internet Gateway to a VPC that contains a private NAT gateway and route traffic from that private NAT gateway toward the IGW — the IGW drops the traffic. A private NAT gateway cannot be used for internet access under any configuration.
5. Elastic IP and NAT Gateway
Section titled “5. Elastic IP and NAT Gateway”What an Elastic IP is
Section titled “What an Elastic IP is”An Elastic IP (EIP) is a static, persistent public IPv4 address that you allocate to your AWS account. Unlike the auto-assigned public IPs that EC2 instances get (which change every restart), an EIP stays yours until you explicitly release it.
A public NAT Gateway must have an EIP. The EIP is the address that the internet sees for all outbound traffic from your private subnets.
Why NAT Gateway requires an EIP (not a regular public IP)
Section titled “Why NAT Gateway requires an EIP (not a regular public IP)”Regular public IPs (auto-assigned to EC2 instances) are dynamic — they change when the instance restarts and they cannot be moved between resources. A NAT Gateway needs a stable, predictable public IP because:
- Outbound IP whitelisting — many external services (payment APIs, partner services, corporate firewalls) whitelist specific IP addresses. If your outbound IP changes, your connections break. By using an EIP on the NAT Gateway, your entire private subnet has one predictable outbound IP.
- Managed resource continuity — AWS manages the NAT Gateway. If AWS needs to perform maintenance or replace hardware, the NAT Gateway may be recreated internally. The EIP stays stable across these events because it is decoupled from the hardware.
- DNS and auditing — with a static EIP, you can create DNS records pointing to your outbound IP and correlate audit logs to a predictable source address.
What happens to the EIP when you delete a NAT Gateway
Section titled “What happens to the EIP when you delete a NAT Gateway”Delete NAT Gateway │ ▼EIP is DISASSOCIATED from the NAT Gateway(EIP is no longer attached to anything) │ ▼EIP REMAINS in your account — you keep paying for it(EIP idle charges apply until you release it) │ ▼You must MANUALLY release the EIP if you no longer need it(EC2 → Elastic IPs → Actions → Release)Gotcha: Deleting a NAT Gateway does NOT release the EIP. The EIP stays allocated to your account and you are charged for it ($0.005/hour when idle). Always explicitly release EIPs you no longer need.
Multiple EIPs per NAT Gateway
Section titled “Multiple EIPs per NAT Gateway”Each public NAT Gateway can have up to 8 IPv4 addresses (1 primary + 7 secondary EIPs). Why would you want multiple EIPs on one NAT Gateway?
This relates to the 55,000 concurrent connections limit (explained in Section 9). The limit is per IP address per unique destination. If your private instances are making massive numbers of connections to the same external service, you can hit this limit.
By adding more EIPs, the NAT Gateway can use different source IPs to distribute connections:
One EIP: 55,000 concurrent connections to one destinationTwo EIPs: 110,000 concurrent connections to one destinationFour EIPs: 220,000 concurrent connections to one destinationThe default quota is 2 EIPs per NAT Gateway — you can increase this via AWS Service Quotas.
6. Placement Rule — Why NAT Gateway Must Be in a Public Subnet
Section titled “6. Placement Rule — Why NAT Gateway Must Be in a Public Subnet”This is one of the most frequently asked questions and the most common mistake beginners make.
The rule: A public NAT Gateway must be placed in a public subnet — a subnet whose route table has 0.0.0.0/0 → IGW.
Here is why, step by step:
INCORRECT SETUP (NAT Gateway in private subnet):
private-ec2 nat-gw (WRONG — in private-subnet)(10.0.0.140) (10.0.0.130) │ │ ▼ ▼private-subnet Route Table: 10.0.0.0/24 → local 0.0.0.0/0 → nat-gw ← private-ec2 sends internet traffic here
nat-gw wants to forward this to the internet.nat-gw consults its own subnet's route table (private-subnet): 0.0.0.0/0 → ??? (no route to IGW — this is a private subnet!)
Result: nat-gw cannot reach the internet. Traffic is dropped.CORRECT SETUP (NAT Gateway in public subnet):
private-ec2 nat-gw (CORRECT — in public-subnet)(10.0.0.140) (10.0.0.5) │ │ ▼ ▼private-subnet Route Table: public-subnet Route Table: 10.0.0.0/24 → local 10.0.0.0/24 → local 0.0.0.0/0 → nat-gw 0.0.0.0/0 → my-igw ← THIS is why!
Result: private-ec2 → nat-gw → my-igw → Internet. Works perfectly.The NAT Gateway itself needs a path to the internet to forward the traffic it receives. It can only get that path through the route table of the subnet it lives in. A public subnet’s route table has 0.0.0.0/0 → IGW, giving the NAT Gateway the exit it needs.
Think of the NAT Gateway as a security guard at a building entrance. The guard (NAT Gateway) must be stationed at the door (public subnet — the one that faces outside). Putting the guard in an interior room (private subnet) means the guard can’t actually see or reach the outside world.
7. High Availability — One NAT Gateway Per AZ
Section titled “7. High Availability — One NAT Gateway Per AZ”The single-AZ problem
Section titled “The single-AZ problem”A NAT Gateway is confined to the AZ it is created in. It has internal redundancy within that AZ (AWS manages this), but if the entire AZ goes down, the NAT Gateway goes down with it.
Look at what happens with a single NAT Gateway setup when us-east-1a fails:
Single NAT Gateway AZ failure problem
This defeats the purpose of multi-AZ architecture. The AZ failure should only affect resources in that AZ, not cascade to other AZs.
The correct HA setup — one NAT Gateway per AZ
Section titled “The correct HA setup — one NAT Gateway per AZ”Correct HA setup: one NAT Gateway per AZ
Each AZ’s private subnets use only the NAT Gateway in their own AZ. Failure is contained.
The cross-AZ data transfer cost angle
Section titled “The cross-AZ data transfer cost angle”Even beyond HA, using a NAT Gateway in a different AZ from your instance costs extra money. AWS charges for cross-AZ data transfer. If private-ec2-2 (us-east-1b) routes through nat-gw (us-east-1a), every byte of traffic crosses an AZ boundary and incurs a transfer fee — on top of the NAT Gateway’s own data processing charge.
By routing each AZ’s instances to the NAT Gateway in the same AZ, you avoid cross-AZ transfer costs entirely.
COST COMPARISON (when scaling to multiple AZs):
Single NAT Gateway in us-east-1a: private-ec2 → nat-gw: same AZ, no cross-AZ charge private-ec2-2 → nat-gw: cross-AZ, extra charge per GB
One NAT Gateway per AZ: private-ec2 → nat-gw: same AZ, no cross-AZ charge private-ec2-2 → nat-gw-2: same AZ, no cross-AZ chargeThe cost of running a second NAT Gateway (hourly fee) is often offset by eliminating cross-AZ data transfer fees for production workloads with significant traffic.
Our example architecture — single-AZ setup
Section titled “Our example architecture — single-AZ setup”Our running example uses a single AZ (us-east-1a) with one NAT Gateway — the simplest correct setup.
Part 3 architecture: my-vpc with NAT Gateway
When you expand to multiple AZs, apply the one-per-AZ pattern shown in the conceptual diagrams above: each AZ gets its own NAT Gateway in its own public subnet, and each AZ’s private route table points only to the NAT Gateway in the same AZ.
8. NAT Gateway vs NAT Instance
Section titled “8. NAT Gateway vs NAT Instance”Before NAT Gateway existed, AWS customers had to run a NAT Instance — an EC2 instance manually configured to perform NAT. NAT Gateway replaced this pattern for most use cases, but understanding NAT Instance helps clarify exactly what NAT Gateway abstracts away.
What a NAT Instance is
Section titled “What a NAT Instance is”A NAT Instance is just an EC2 instance (typically running Amazon Linux) with two things configured:
- Source/destination check disabled — by default, EC2 instances drop any packet where the instance is not the actual source or destination. For a NAT instance to forward packets on behalf of other instances, this check must be disabled.
- IP forwarding and iptables rules — the OS is configured to forward packets and apply NAT rules using
iptables.
You are responsible for all of this: the EC2 instance, the OS configuration, security patching, scaling if traffic grows, and replacing the instance if it fails.
Comparison
Section titled “Comparison”| Feature | NAT Gateway | NAT Instance |
|---|---|---|
| Management | Fully AWS-managed | You manage the EC2 instance |
| Availability | HA within the AZ (AWS handles) | Single instance — you must build your own HA with ASG |
| Bandwidth | 5 Gbps → auto-scales to 100 Gbps | Depends entirely on instance type (t3.small vs c5.4xlarge) |
| Scaling | Automatic | Manual (change instance type) |
| Security Groups | Cannot attach | Can attach |
| Cost | Hourly + per-GB charges | EC2 instance + data transfer |
| Patching | AWS handles | You handle OS patches |
| Setup complexity | Low (console, a few clicks) | Higher (disable src/dst check,configure iptables) |
| IP fragmentation TCP | Not supported | Supported |
| Port forwarding | Not supported | Supported (via iptables rules) |
| Use as bastion host | Not possible | Possible (SSH through it) |
| Custom routing logic | Not possible | Possible |
| Logging / metrics | CloudWatch metrics built-in | You set up custom logging |
When would you still use a NAT Instance?
Section titled “When would you still use a NAT Instance?”NAT Gateway handles nearly every production scenario better than NAT Instance. The narrow cases where you might still reach for NAT Instance:
- You need TCP IP fragmentation support (NAT Gateway drops fragmented TCP)
- You need port forwarding (redirecting traffic on specific ports to different internal destinations)
- You are running a very small workload and want to use the NAT instance as a bastion host too (two birds, one stone)
- Hard budget constraint and the NAT Gateway hourly cost is genuinely prohibitive for a lab/sandbox environment
For anything production-grade, use NAT Gateway.
9. Technical Limits and Performance
Section titled “9. Technical Limits and Performance”These are the official AWS limits for a NAT Gateway. Understanding them matters when designing architectures at scale.
Bandwidth
Section titled “Bandwidth”Baseline bandwidth: 5 GbpsMaximum bandwidth: 100 Gbps (auto-scales, no action required)
Baseline packets/sec: 1,000,000 (1 million pps)Maximum packets/sec: 10,000,000 (10 million pps — packets dropped beyond this)You do not need to provision or configure this scaling — it happens automatically. If your workload needs more than 5 Gbps, the NAT Gateway scales up transparently.
Concurrent connections
Section titled “Concurrent connections”55,000 simultaneous connections per unique destination per IP addressA “unique destination” means a unique combination of: destination IP + destination port + protocol.
In practice, this means:
nat-gw (IP: 54.10.10.10) can hold: 55,000 concurrent TCP connections to 142.250.80.46:443 (Google HTTPS) 55,000 concurrent TCP connections to 1.1.1.1:443 (Cloudflare HTTPS) 55,000 concurrent UDP connections to 8.8.8.8:53 (Google DNS) ... and so on for every unique destinationEach unique destination gets its own pool of 55,000. The limit is not a global cap across all connections.
When do you hit this limit?
Mostly when many instances are connecting to the same destination IP and port — for example, hundreds of instances all hitting the same API endpoint simultaneously.
How to increase it:
Add secondary EIPs to the NAT Gateway. Each additional IP address adds another 55,000 connections per unique destination.
1 EIP: 55,000 connections per unique destination2 EIPs: 110,000 connections per unique destination4 EIPs: 220,000 connections per unique destination8 EIPs: 440,000 connections per unique destination (maximum)Port ranges
Section titled “Port ranges”Ephemeral port range used by NAT Gateway: 1024 – 65535This matters for NACL configuration. When a private instance sends a request through NAT Gateway, the NAT Gateway picks a source port from this range. Return traffic from the destination will come back to NAT Gateway on one of these ports. Your NACLs must allow this range.
MTU (Maximum Transmission Unit)
Section titled “MTU (Maximum Transmission Unit)”NAT Gateway max MTU: 8500 bytes (supports jumbo frames internally)Internet-bound MTU: Must be 1500 bytes or less on EC2 instances
Why 1500 for internet: The public internet does not support jumbo frames. Even though AWS's internal network supports 8500 bytes, once traffic exits to the internet it must be 1500 bytes.NAT Gateway supports Path MTU Discovery (PMTUD) — if a remote server sends back an ICMP “Fragmentation Needed” message, NAT Gateway propagates this signal, allowing the connection to negotiate a smaller packet size automatically.
10. Connection Tracking and Idle Timeouts
Section titled “10. Connection Tracking and Idle Timeouts”NAT Gateway is stateful in one specific sense: it tracks active translations so it can match responses back to the correct private instance. This is different from a full stateful firewall — NAT Gateway does not inspect or block traffic the way a Security Group does. It just maintains the translation table.
Idle timeout — 350 seconds
Section titled “Idle timeout — 350 seconds”NAT Gateway has an idle connection timeout of 350 seconds. If a connection has no data flowing for 350 seconds, NAT Gateway removes it from its translation table.
Connection established: 10.0.0.140:52341 ←→ 54.10.10.10:61200 ←→ 8.8.8.8:443Data flowing: translation table entry stays activeNo data for 350 seconds: NAT Gateway removes the entry from translation table
What happens next: If 8.8.8.8 sends data after 350 seconds: NAT Gateway has no record of this connection NAT Gateway sends RST (reset) to the client (10.0.0.140)
RST = abrupt connection close, NOT graceful FIN handshakeRST vs FIN — why this matters for applications
Section titled “RST vs FIN — why this matters for applications”When a connection ends normally, the parties exchange FIN packets — “I’m done sending, you can close.” This is graceful. Most applications handle FIN cleanly.
When NAT Gateway forcibly removes an idle connection, it sends RST — “this connection no longer exists.” RST is abrupt. Applications that do not handle RST gracefully may crash, throw unhandled exceptions, or not reconnect. This is a common source of mysterious intermittent failures.
Application behavior that causes idle timeout issues: - Connection pools that keep connections open "forever" between requests - Long-polling HTTP connections - WebSocket connections with no heartbeat - Database connection pools with idle connections
Fix: Enable TCP keepalive on the application or OS with interval < 350 seconds
Linux: sudo sysctl -w net.ipv4.tcp_keepalive_time=300 sudo sysctl -w net.ipv4.tcp_keepalive_intvl=60 sudo sysctl -w net.ipv4.tcp_keepalive_probes=3
This sends keepalive probes every 300 seconds of idle time, keeping the NAT Gateway translation entry alive.CloudWatch metrics for connection health
Section titled “CloudWatch metrics for connection health”| Metric | What it tells you |
|---|---|
| ErrorPortAllocation | > 0 means NAT Gateway ran out of ports — add more EIPs |
| IdleTimeoutCount | Connections timing out from inactivity — fix with keepalive |
| PacketsDropCount | Packets being dropped — possible port exhaustion or AZ issue |
| ActiveConnectionCount | Current active connections — monitor for growth trends |
| ConnectionAttemptCount | New connection attempts per minute |
Monitor ErrorPortAllocation and PacketsDropCount in CloudWatch. If either is non-zero, investigate before it becomes a user-facing problem.
11. NAT Gateway vs Internet Gateway — The Complete Comparison
Section titled “11. NAT Gateway vs Internet Gateway — The Complete Comparison”These two often get confused because both deal with internet connectivity. They serve completely different purposes and work at different levels.
| Feature | Internet Gateway (IGW) | NAT Gateway |
|---|---|---|
| What it is | VPC attachment point for internet connectivity | A proxy/forwarder for private instances |
| Where it lives | Attached to the VPC (not in any subnet) | Inside a subnet (public) |
| Traffic direction | Bidirectional (inbound + outbound) | Outbound only (responses allowed back,no unsolicited inbound) |
| Instance needs public IP? | YES — instance must have a public IP for inbound | NO — NAT GW has the EIP on behalf of instances |
| Internet can initiate? | YES (if Security Group allows) | NO (never) |
| Handles NAT? | YES — translates instance private IP ↔︎ public IP | YES — translates private IP to its own EIP |
| Where NAT happens | At the IGW (instance’s public IP ↔︎ instance’s private IP) | At the NAT Gateway (instance’s private IP ↔︎ NAT GW’s EIP) |
| Used by | Public subnets | Private subnets |
| One-per-VPC limit? | YES — one IGW per VPC | NO — many NAT GWs per VPC |
| Redundancy | AWS manages, always HA across ALL AZs | HA within AZ only (create one per AZ) |
| Cost | Free (pay for data transfer) | Hourly + per-GB data charge |
The flow for public vs private subnet — side by side
Section titled “The flow for public vs private subnet — side by side”PUBLIC SUBNET INSTANCE (has public IP, uses IGW): EC2 (10.0.0.10 / public: 52.x.x.x) → Sends packet src: 10.0.0.10 dst: 8.8.8.8 → Route table: 0.0.0.0/0 → my-igw → IGW translates: 10.0.0.10 → 52.x.x.x → Internet sees: src: 52.x.x.x dst: 8.8.8.8 ← Response: dst: 52.x.x.x ← IGW translates: 52.x.x.x → 10.0.0.10 ← EC2 receives: dst: 10.0.0.10
PRIVATE SUBNET INSTANCE (no public IP, uses NAT Gateway): EC2 (10.0.0.140, no public IP) → Sends packet src: 10.0.0.140 dst: 8.8.8.8 → Route table: 0.0.0.0/0 → nat-gw → NAT GW translates: 10.0.0.140:52341 → 54.10.10.10:61200 → NAT GW route: 0.0.0.0/0 → my-igw → IGW sees NAT GW's EIP — forwards to internet → Internet sees: src: 54.10.10.10 dst: 8.8.8.8 ← Response: dst: 54.10.10.10 ← NAT GW translates: 54.10.10.10:61200 → 10.0.0.140:52341 ← EC2 receives: dst: 10.0.0.140The key insight: NAT Gateway and IGW work together for private subnet internet access. NAT Gateway handles the private-to-EIP translation, and IGW handles the EIP-to-internet routing. Neither one alone is sufficient.
12. When NOT to Use NAT Gateway — VPC Endpoints
Section titled “12. When NOT to Use NAT Gateway — VPC Endpoints”NAT Gateway is charged per hour plus per GB of data processed. Every byte of traffic going through NAT Gateway costs money. For traffic destined to AWS services (S3, DynamoDB, CloudWatch, SSM, etc.), routing through NAT Gateway is wasteful because you are:
- Paying NAT Gateway data processing charges
- Sending traffic out of your VPC and back in again unnecessarily
- Potentially sending traffic over the internet (less secure)
VPC Endpoints let private instances communicate with AWS services without leaving the AWS network — no NAT Gateway, no internet, no IGW.
Two types of VPC Endpoints
Section titled “Two types of VPC Endpoints”Gateway Endpoints (free)
Available for: Amazon S3 and Amazon DynamoDB only.
A Gateway Endpoint adds a route to your route table pointing directly to the S3 or DynamoDB service within AWS’s network. Traffic never leaves AWS.
WITHOUT Gateway Endpoint (expensive): private-ec2 → NAT GW → IGW → Internet → S3 Cost: NAT GW hourly + NAT GW data charge + data transfer
WITH Gateway Endpoint (free): private-ec2 → S3 Gateway Endpoint → S3 (stays inside AWS network) Cost: $0 for the endpoint itself, only standard S3 request costs
Setup: VPC → Endpoints → Create Endpoint Service: com.amazonaws.us-east-1.s3 Type: Gateway VPC: my-vpc Route Tables: select private-rt
This automatically adds a route: Destination: pl-xxxxxxxx (S3's managed prefix list) Target: vpce-xxxxxxxxx
private-ec2 traffic to S3 now uses this route instead of 0.0.0.0/0 → NAT GW.Interface Endpoints (paid, but often cheaper than NAT Gateway for high volume)
Available for: most other AWS services (EC2 API, CloudWatch, SSM, SQS, SNS, Secrets Manager, KMS, etc.)
An Interface Endpoint creates an ENI in your subnet with a private IP. Traffic to AWS services is routed to this private IP. No NAT Gateway involved.
Cost comparison for SSM: WITHOUT Interface Endpoint: Every SSM call goes: private-ec2 → NAT GW → Internet → SSM endpoint Charge: NAT GW hourly ($0.045/hr) + data processing ($0.045/GB)
WITH Interface Endpoint: Every SSM call goes: private-ec2 → Interface Endpoint ENI → SSM Charge: Interface Endpoint hourly ($0.01/hr per AZ) — but zero data processingRule of thumb
Section titled “Rule of thumb”| Traffic destination | Recommendation |
|---|---|
| S3 or DynamoDB | Use Gateway Endpoint (free) |
| Other AWS services | Consider Interface Endpoint(evaluate: volume of traffic to that service vs Interface Endpoint hourly cost) |
| Third-party internet (APIs, package repos, etc.) | NAT Gateway is required (no alternative) |
For instances that only need to call AWS services (never third-party internet), you may be able to eliminate NAT Gateway entirely by using VPC Endpoints — especially for EC2 instances in private subnets that use SSM Session Manager instead of SSH.
13. Cost Breakdown
Section titled “13. Cost Breakdown”NAT Gateway has two cost components:
| Component | Charge |
|---|---|
| NAT Gateway hourly | Per hour the NAT Gateway is provisioned(charged from creation to deletion, regardless of traffic) |
| Data processing | Per GB of data flowing THROUGH the NAT Gateway(charged for both inbound and outbound data) |
Exact prices vary by region and change over time. Always check the Amazon VPC Pricing page for current rates. The patterns below apply regardless of the specific numbers.
Cost optimization strategies
Section titled “Cost optimization strategies”1. Place resources in the same AZ as the NAT Gateway
Cross-AZ data transfer has its own charge on top of NAT Gateway processing. If private-ec2-2 in us-east-1b routes through nat-gw in us-east-1a, you pay:
NAT GW data processing charge (for the data going through nat-gw)+ Cross-AZ transfer charge (for data crossing AZ boundary to reach nat-gw)+ Cross-AZ transfer charge (for the response coming back)With one NAT Gateway per AZ, you eliminate the cross-AZ charges.
2. Use VPC Endpoints for AWS services
As covered in Section 12 — S3, DynamoDB, and other AWS services do not need to go through NAT Gateway.
3. Delete NAT Gateways in non-production environments when not in use
NAT Gateway charges hourly from creation to deletion. In dev/test environments that are only active during business hours, deleting the NAT Gateway at the end of each day and recreating it in the morning eliminates overnight charges.
Automatable with AWS Lambda + EventBridge: 8:00 AM → Lambda creates NAT Gateway, updates route table 7:00 PM → Lambda deletes NAT Gateway4. Do not confuse idle with free
Even with zero traffic flowing through it, a NAT Gateway sitting there with Available status is charging you hourly. If you are not using it, delete it.
14. NACLs and NAT Gateway — the Ephemeral Port Problem
Section titled “14. NACLs and NAT Gateway — the Ephemeral Port Problem”This is the most common reason NAT Gateway stops working after it seems to be set up correctly. It is a subnet NACL misconfiguration.
What ephemeral ports are
Section titled “What ephemeral ports are”When private-ec2 (10.0.0.140) makes an outbound TCP connection to 8.8.8.8:443:
- The destination port is 443 (HTTPS) — fixed by the service
- The source port is chosen randomly by the OS from a range called ephemeral ports
On Linux: 32768 – 60999
On Windows: 1024 – 65535
NAT Gateway uses ports 1024 – 65535 for its translated source ports.
When the response comes back from 8.8.8.8, it is addressed to the NAT Gateway on one of these ephemeral ports. If your NACLs don’t allow these ports, the response is dropped and the connection fails.
Which NACLs need to allow ephemeral ports
Section titled “Which NACLs need to allow ephemeral ports”Both the private subnet NACL and the public subnet NACL (where the NAT Gateway lives) need to be configured correctly.
Traffic flow for private-ec2 → NAT GW → Internet:
STEP 1: private-ec2 → nat-gw Traffic crosses: private-subnet NACL (outbound) Then: public-subnet NACL (inbound at nat-gw)
STEP 2: nat-gw → Internet (via IGW) Traffic crosses: public-subnet NACL (outbound)
STEP 3: Internet → nat-gw (response) Traffic crosses: public-subnet NACL (inbound) Response port is an ephemeral port (1024-65535) ← must allow this
STEP 4: nat-gw → private-ec2 (translated response) Traffic crosses: public-subnet NACL (outbound) Then: private-subnet NACL (inbound) Response port is an ephemeral port (1024-65535) ← must allow thisCorrect NACL configuration example
Section titled “Correct NACL configuration example”private-subnet NACL — INBOUND:
| Rule# | Type | Protocol | Port Range | Source | Action |
|---|---|---|---|---|---|
| 100 | Custom TCP | TCP | 1024-65535 | 0.0.0.0/0 | ALLOW ← return traffic |
| * | All | All | All | 0.0.0.0/0 | DENY |
private-subnet NACL — OUTBOUND:
| Rule# | Type | Protocol | Port Range | Destination | Action |
|---|---|---|---|---|---|
| 100 | HTTPS | TCP | 443 | 0.0.0.0/0 | ALLOW |
| 110 | HTTP | TCP | 80 | 0.0.0.0/0 | ALLOW |
| 120 | Custom UDP | UDP | 53 | 0.0.0.0/0 | ALLOW ← DNS |
| * | All | All | All | 0.0.0.0/0 | DENY |
public-subnet NACL — INBOUND:
| Rule# | Type | Protocol | Port Range | Source | Action |
|---|---|---|---|---|---|
| 100 | Custom TCP | TCP | 1024-65535 | 0.0.0.0/0 | ALLOW ← return traffic |
| * | All | All | All | 0.0.0.0/0 | DENY |
public-subnet NACL — OUTBOUND:
| Rule# | Type | Protocol | Port Range | Destination | Action |
|---|---|---|---|---|---|
| 100 | HTTPS | TCP | 443 | 0.0.0.0/0 | ALLOW |
| 110 | HTTP | TCP | 80 | 0.0.0.0/0 | ALLOW |
| 120 | Custom UDP | UDP | 53 | 0.0.0.0/0 | ALLOW |
| * | All | All | All | 0.0.0.0/0 | DENY |
Security Groups do NOT have this problem because Security Groups are stateful — they automatically allow response traffic. NACLs are stateless — they independently evaluate every packet in both directions, so you must explicitly allow ephemeral port responses.
Why Security Groups are not the problem here: You cannot attach a Security Group to a NAT Gateway. Security Groups only apply to EC2 instances and a few other resources. The NACL on the subnet is the only firewall mechanism at the NAT Gateway level.
15. Common Mistakes and Gotchas
Section titled “15. Common Mistakes and Gotchas”1. NAT Gateway placed in a private subnet
Section titled “1. NAT Gateway placed in a private subnet”Symptom: Private instances have no internet access even with NAT Gateway configured.
Cause: NAT Gateway was created in a private subnet that has no route to the IGW.
Fix: Recreate the NAT Gateway in a public subnet (one with 0.0.0.0/0 → IGW).
2. Route table not updated after creating NAT Gateway
Section titled “2. Route table not updated after creating NAT Gateway”Symptom: Private instances still can’t reach internet after NAT Gateway is Available.
Cause: The private subnet’s route table still has no 0.0.0.0/0 route, or it points to the old route.
Fix: Add 0.0.0.0/0 → nat-gw-xxxx to the private route table.
3. EIP not released after NAT Gateway deletion
Section titled “3. EIP not released after NAT Gateway deletion”Symptom: Unexpected EIP charges after deleting the NAT Gateway. Cause: AWS disassociates the EIP but does not release it. It stays in your account billing as idle EIP. Fix: Go to EC2 → Elastic IPs → Release address.
4. Stale routes pointing to deleted NAT Gateway
Section titled “4. Stale routes pointing to deleted NAT Gateway”Symptom: Routes in route table show blackhole status.
Cause: NAT Gateway was deleted but the route entry 0.0.0.0/0 → nat-gw-xxxx still exists.
Fix: Delete or update the blackhole route entry.
5. Cannot ping private instance from another VPC via NAT Gateway
Section titled “5. Cannot ping private instance from another VPC via NAT Gateway”Symptom: Trying to SSH or ping from a peered VPC through a NAT Gateway fails. Cause: You cannot route traffic through a NAT Gateway via VPC Peering. Traffic must originate from inside the NAT Gateway’s own VPC. The path “VPC-B → Peering → NAT GW in VPC-A → Internet” is not supported. Fix: Give the connecting resource its own internet access, or restructure the architecture.
6. 350-second RST kills long-lived connections
Section titled “6. 350-second RST kills long-lived connections”Symptom: Database connections, WebSocket connections, or long-running requests intermittently fail after minutes of inactivity. Cause: NAT Gateway idle timeout (350 seconds) removed the connection from its translation table. When the remote server sends data, NAT Gateway RSTs the connection. Fix: Enable TCP keepalive on the application or OS with interval less than 350 seconds.
7. Private NAT Gateway routing to Internet Gateway
Section titled “7. Private NAT Gateway routing to Internet Gateway”Symptom: Instances can’t reach the internet even though there’s both a private NAT Gateway and an IGW in the same VPC. Cause: Private NAT Gateways are not designed for internet access. The IGW drops traffic from a private NAT Gateway. Fix: Use a public NAT Gateway for internet access.
8. NACLs blocking ephemeral port return traffic
Section titled “8. NACLs blocking ephemeral port return traffic”Symptom: Connections from private instances time out or fail, but Security Group rules look correct.
Cause: NACLs on the private or public subnet are not allowing inbound TCP ports 1024–65535.
Fix: Add NACL inbound rule allowing TCP 1024–65535 from 0.0.0.0/0 on both the private and public subnets.
9. Single NAT Gateway causes AZ-level failure cascade
Section titled “9. Single NAT Gateway causes AZ-level failure cascade”Symptom: When one AZ fails, private instances in ALL AZs lose internet access. Cause: All private subnets route through a single NAT Gateway that is in the failing AZ. Fix: Create one NAT Gateway per AZ, with each AZ’s private route table pointing to the NAT Gateway in the same AZ.
10. Paying for cross-AZ data transfer unnecessarily
Section titled “10. Paying for cross-AZ data transfer unnecessarily”Symptom: Higher-than-expected data transfer charges.
Cause: Private instances in us-east-1b are routing through a NAT Gateway in us-east-1a (cross-AZ).
Fix: One NAT Gateway per AZ, route tables pointing to same-AZ NAT Gateway.
16. Quick Reference Cheatsheet
Section titled “16. Quick Reference Cheatsheet”NAT Gateway types at a glance
Section titled “NAT Gateway types at a glance”| Type | Subnet | EIP? | Connects to | Source IP seen |
|---|---|---|---|---|
| Public NAT GW | Public | YES | Internet (via IGW)Other VPCs (via TGW) | EIPNAT GW private IP |
| Private NAT GW | Private | NO | Other VPCs (via TGW)On-premises (via VGW) | NAT GW private IPNAT GW private IP |
Technical limits
Section titled “Technical limits”Bandwidth: 5 Gbps baseline → auto-scales to 100 GbpsPackets/sec: 1M pps baseline → auto-scales to 10M ppsConcurrent conns: 55,000 per unique destination per IP addressMax EIPs: 8 (1 primary + 7 secondary)Idle timeout: 350 seconds (sends RST, not FIN)MTU: 8500 bytes internal, 1500 bytes for internet-boundEphemeral ports: 1024–65535Decision flowchart
Section titled “Decision flowchart”Does your private instance need internet access? │ YES → Use Public NAT Gateway │ (in public subnet, with EIP, with route to IGW) │ NO → Is it connecting to another VPC with overlapping CIDRs? │ YES → Use Private NAT Gateway │ (in private subnet, no EIP, route via TGW) │ NO → Is it connecting to AWS services (S3, DynamoDB)? │ YES → Use VPC Gateway Endpoint (free) │ NO → Is it connecting to other AWS services (SSM, CW)? │ YES → Consider Interface Endpoint │ NO → Use VPC Peering or Transit GatewayRoute table patterns
Section titled “Route table patterns”PUBLIC SUBNET (for NAT Gateway to use): Destination Target 10.0.0.0/24 local 0.0.0.0/0 my-igw
PRIVATE SUBNET (instances using NAT Gateway): Destination Target 10.0.0.0/24 local 0.0.0.0/0 nat-gw ← always same AZ for HA + cost
PRIVATE SUBNET (with S3 Gateway Endpoint added): Destination Target 10.0.0.0/24 local pl-xxxxxxxx vpce-xxxxxxxxx ← S3 traffic bypasses NAT GW 0.0.0.0/0 nat-gwHA checklist
Section titled “HA checklist”□ One NAT Gateway per AZ that has private workloads□ Each AZ's private route table points to NAT GW in the SAME AZ□ Each NAT Gateway has its own Elastic IP□ Each NAT Gateway is in a PUBLIC subnet□ Public subnet route table has 0.0.0.0/0 → IGW□ NACLs allow inbound TCP 1024-65535 on both private and public subnets□ TCP keepalive enabled on instances with long-lived connections (< 350s interval)□ CloudWatch alarms on ErrorPortAllocation and PacketsDropCount□ EIP released manually after NAT Gateway deletionCost awareness rules
Section titled “Cost awareness rules”Charged for: Every hour NAT Gateway exists (Available, Pending, or Deleting)Charged for: Every GB processed through NAT GatewayNot charged: For the EIP while it is attached to a running NAT GatewayCHARGED FOR: EIP when it is idle (disassociated) — release it immediately on NAT GW deletionSavings: Same-AZ routing eliminates cross-AZ transfer chargesSavings: Gateway Endpoints for S3/DynamoDB eliminate NAT data processing chargesEnd of Part 3 — NAT Gateway Deep Dive
Next: Part 4 — VPC Peering and Transit Gateway