Skip to content

Part 3 NAT Gateway Deep Dive


Our Example Architecture (Reference This Throughout)

Section titled “Our Example Architecture (Reference This Throughout)”

Before anything else, here is the setup we will use for every example, diagram, and packet trace in this part.

VPC: my-vpc10.0.0.0/24Region: us-east-1

Subnet NameCIDRAZType
public-subnet10.0.0.0/25us-east-1aPublic
private-subnet10.0.0.128/25us-east-1aPrivate

Key resources:

NAT Gateway: nat-gw → lives in public-subnet → EIP: 54.10.10.10
Internet Gateway: my-igw → attached to my-vpc
EC2 Instance (public): public-ec2 → public-subnet → private IP: 10.0.0.10
EC2 Instance (private): private-ec2 → private-subnet → private IP: 10.0.0.140

  1. Why NAT Exists — The Core Problem
  2. What NAT Actually Does — PAT Explained
  3. The Full Packet Journey — Step by Step
  4. Public NAT Gateway vs Private NAT Gateway
  5. Elastic IP and NAT Gateway
  6. Placement Rule — Why NAT Gateway Must Be in a Public Subnet
  7. High Availability — One NAT Gateway Per AZ
  8. NAT Gateway vs NAT Instance
  9. Technical Limits and Performance
  10. Connection Tracking and Idle Timeouts
  11. NAT Gateway vs Internet Gateway — The Complete Comparison
  12. When NOT to Use NAT Gateway — VPC Endpoints
  13. Cost Breakdown
  14. NACLs and NAT Gateway — the Ephemeral Port Problem
  15. Common Mistakes and Gotchas
  16. Quick Reference Cheatsheet

The internet runs on IPv4 addresses. There are exactly 2^32 = 4,294,967,296 possible IPv4 addresses — about 4.3 billion. The internet has far more than 4.3 billion connected devices. We ran out of assignable public IPv4 addresses around 2011.

The solution that kept the internet running was NAT — Network Address Translation. NAT allows many devices that each have a private IP address to share a single public IP address when communicating with the internet. Instead of every device needing its own public IP, an entire office building, data center, or VPC can share one (or a handful of) public IPs.

WITHOUT NAT:
Device A needs public IP: 54.10.10.1
Device B needs public IP: 54.10.10.2
Device C needs public IP: 54.10.10.3
...thousands of devices, thousands of public IPs consumed
WITH NAT:
Device A (private: 10.0.0.140) ─┐
Device B (private: 10.0.0.141) ─┼─→ NAT → public IP: 54.10.10.10 → Internet
Device C (private: 10.0.0.142) ─┘
All three share one public IP. Only one public IP consumed.

In a VPC, every instance in a private subnet has a private IP. Private IPs (like 10.0.x.x) are not routable on the internet — if a private-subnet instance tried to send a packet directly to 8.8.8.8 with source address 10.0.0.140 (a private IP), every router on the internet would discard it because 10.0.0.140 is not a valid internet address, and responses would have nowhere to go.

But private instances still need outbound internet access for legitimate purposes:

  • Downloading OS patches (yum update -y, apt-get upgrade)
  • Pulling container images from Docker Hub or ECR
  • Calling third-party APIs (payment gateways, auth providers)
  • Fetching configuration from public endpoints
  • Sending logs or metrics to external services

The requirement is asymmetric:

  • Outbound to internet: YES — instances must be able to initiate connections
  • Inbound from internet: NO — the internet must never be able to initiate connections to private instances

NAT Gateway solves exactly this. It is the one-way valve — lets your private instances call out to the internet but drops any unsolicited connection coming in.


2. What NAT Actually Does — PAT Explained

Section titled “2. What NAT Actually Does — PAT Explained”

Most people say “NAT” but what NAT Gateway actually performs is PAT — Port Address Translation, also called NAT overloading or NAPT (Network Address Port Translation).

Basic NAT is a simple one-to-one mapping: one private IP maps to one public IP. If you have 100 private instances, you need 100 public IPs. This doesn’t solve the exhaustion problem at all.

PAT is a many-to-one mapping: many private IPs share one public IP by using different source ports to keep conversations separate. One public IP can serve tens of thousands of concurrent connections simultaneously.

Every TCP/UDP connection is uniquely identified by a 5-tuple:

(source IP, source port, destination IP, destination port, protocol)

When private-ec2 (10.0.0.140) makes an HTTPS request to Google (142.250.80.46 port 443):

BEFORE PAT (packet leaving private-ec2):
Source IP: 10.0.0.140
Source Port: 52341 ← randomly chosen ephemeral port by the OS
Dest IP: 142.250.80.46
Dest Port: 443
NAT Gateway receives this and creates a translation record:
PRIVATE SIDE PUBLIC SIDE
10.0.0.140 : 52341 ←→ 54.10.10.10 : 58901
AFTER PAT (packet leaving NAT Gateway toward internet):
Source IP: 54.10.10.10 ← NAT Gateway's EIP replaces private IP
Source Port: 58901 ← NAT Gateway chooses a new source port
Dest IP: 142.250.80.46
Dest Port: 443

When Google’s response comes back:

Google responds to 54.10.10.10 port 58901
Dest IP: 54.10.10.10
Dest Port: 58901
NAT Gateway looks up its translation table:
54.10.10.10 : 58901 → 10.0.0.140 : 52341
NAT Gateway rewrites the packet:
Dest IP: 10.0.0.140
Dest Port: 52341
Delivers it to private-ec2 — the response arrives correctly.

The private instance never knows the translation happened. From its perspective it sent a request to Google and got a response back. The NAT Gateway’s translation table is the magic in the middle.

The translation table — keeping conversations separate

Section titled “The translation table — keeping conversations separate”

Now suppose both private-ec2 (10.0.0.140) and a second private instance (10.0.0.141) make requests to Google simultaneously. The NAT Gateway handles this by using different source ports:

TRANSLATION TABLE (simplified):

Private IPPriv PortEIPPub PortDestination
10.0.0.1405234154.10.10.1058901142.250.80.46 : 443
10.0.0.1414421754.10.10.1059044142.250.80.46 : 443
10.0.0.1405239954.10.10.10589551.1.1.1 : 53

Each row is a unique conversation. When a response comes back to 54.10.10.10 : 59044, the NAT Gateway knows it belongs to 10.0.0.141 : 44217. The destination host never knows there are multiple private IPs — it just sees all traffic coming from 54.10.10.10.


3. The Full Packet Journey — Step by Step

Section titled “3. The Full Packet Journey — Step by Step”

Let’s trace a single HTTPS request from private-ec2 to 8.8.8.8 (Google DNS) through the complete stack.

ARCHITECTURE OVERVIEW:
10.0.0.140 10.0.0.5 54.10.10.10
(private-ec2) (nat-gw) (EIP) Internet
│ │ │ │
│ │ │ │
private-subnet public-subnet │ IGW
(10.0.0.128/25) (10.0.0.0/25) │ (my-igw)

Step 1 — Application initiates a request

private-ec2 wants to reach 8.8.8.8 (Google DNS, port 53 UDP)
OS creates the packet:
Source: 10.0.0.140 : 54321
Destination: 8.8.8.8 : 53
Protocol: UDP

Step 2 — Instance consults its routing table

private-subnet Route Table:
Destination Target
───────────────── ───────────────
10.0.0.0/24 local ← 8.8.8.8 doesn't match this
0.0.0.0/0 nat-gw ← 8.8.8.8 matches this
Result: Send to nat-gw

Step 3 — Packet arrives at NAT Gateway

NAT Gateway receives:
Source: 10.0.0.140 : 54321
Destination: 8.8.8.8 : 53
NAT Gateway creates a translation record:
10.0.0.140 : 54321 ←→ 54.10.10.10 : 61200
NAT Gateway rewrites packet:
Source: 54.10.10.10 : 61200 ← EIP replaces private IP
Destination: 8.8.8.8 : 53 ← unchanged

Step 4 — NAT Gateway’s own routing

NAT Gateway (which lives in public-subnet) consults:
public-subnet Route Table:
Destination Target
───────────────── ───────────────
10.0.0.0/24 local
0.0.0.0/0 my-igw ← 8.8.8.8 goes to IGW
Result: Send to my-igw

Step 5 — Internet Gateway handles the packet

IGW receives:
Source: 54.10.10.10 : 61200
Destination: 8.8.8.8 : 53
The source is already a public IP (the EIP). IGW sends this onto the internet.
Note: The IGW is stateless for this direction — it just forwards the packet.

Step 6 — Response returns

8.8.8.8 responds to 54.10.10.10 : 61200:
Source: 8.8.8.8 : 53
Destination: 54.10.10.10 : 61200
IGW receives → forwards to my-vpc (because 54.10.10.10 is the EIP of nat-gw)

Step 7 — NAT Gateway reverses the translation

NAT Gateway receives the response:
Source: 8.8.8.8 : 53
Destination: 54.10.10.10 : 61200
Looks up translation table:
54.10.10.10 : 61200 → 10.0.0.140 : 54321
Rewrites packet:
Source: 8.8.8.8 : 53 ← unchanged
Destination: 10.0.0.140 : 54321 ← restores original private IP
Sends to private-ec2 in private-subnet

Step 8 — private-ec2 receives the response

DNS response arrives at private-ec2.
The application gets its answer.
private-ec2 never knew about the translation.

Full picture:

private-ec2 nat-gw my-igw 8.8.8.8
(10.0.0.140) (10.0.0.x) (edge of VPC) (internet)
│ │ │ │
│─── UDP packet ────►│ │ │
│ src: 10.0.0.140 │ │ │
│ dst: 8.8.8.8:53 │ │ │
│ │ │ │
│ [PAT translate] │ │
│ 10.0.0.140:54321 │ │
│ → 54.10.10.10:61200 │ │
│ │ │ │
│ │─── translated ─────►│ │
│ │ src: 54.10.10.10 │ │
│ │ dst: 8.8.8.8:53 │ │
│ │ │─── to internet─►│
│ │ │ │
│ │ │◄── response ───│
│ │◄── response ───────│ │
│ │ dst: 54.10.10.10 │ │
│ │ │ │
│ [reverse lookup] │ │
│ 54.10.10.10:61200 │ │
│ → 10.0.0.140:54321 │ │
│ │ │ │
│◄── response ───────│ │ │
│ dst: 10.0.0.140 │ │ │
│ │ │ │

4. Public NAT Gateway vs Private NAT Gateway

Section titled “4. Public NAT Gateway vs Private NAT Gateway”

AWS offers two types of NAT Gateway. They serve fundamentally different use cases.

This is what Part 2 covered. It gives private subnet instances outbound internet access.

Purpose: Private instances → Internet
Direction: Outbound only (internet cannot initiate connections in)
Lives in: Public subnet (one with a route to the Internet Gateway)
Requires: Elastic IP address
Traffic path: Instance → NAT GW → IGW → Internet

Example route tables for public NAT Gateway:

public-subnet Route Table:
10.0.0.0/24 → local
0.0.0.0/0 → my-igw ← NAT Gateway needs this to reach internet
private-subnet Route Table:
10.0.0.0/24 → local
0.0.0.0/0 → nat-gw ← instances send internet traffic here

When traffic from the private instance leaves the NAT Gateway through the IGW, the source IP seen by the destination is the EIP (54.10.10.10). This only happens when the traffic path includes the IGW in the same VPC.

Important nuance: If a public NAT Gateway routes traffic through a Transit Gateway or Virtual Private Gateway (instead of through an IGW), the source IP seen by the destination is the private IP of the NAT Gateway, not the EIP. The EIP is only visible to the outside world when the IGW performs its final translation.

This is less talked about but critical for multi-VPC architectures.

Purpose: Connect private instances to OTHER VPCs or on-premises networks
(not the internet)
Direction: Outbound only (no unsolicited inbound)
Lives in: Private subnet (does NOT need a public subnet or IGW)
Requires: NO Elastic IP — uses only private IPs
Traffic path: Instance → Private NAT GW → Transit Gateway → Other VPC

When do you need a private NAT Gateway?

The classic problem it solves: overlapping CIDR ranges between VPCs.

VPC-A uses 10.0.0.0/16. VPC-B also uses 10.0.0.0/16. You cannot connect these with VPC Peering because the CIDRs overlap. But with private NAT, you can still route between them:

VPC-A (10.0.0.0/16) VPC-B (10.0.0.0/16)
───────────────────── ─────────────────────
routable-subnet: 100.64.0.0/28 ALB in routable-subnet
│ │
Private NAT GW Transit Gateway
(IP: 100.64.0.5) │
│ │
TGW ──────────────────────────────────────
  • VPC-A’s instance sends traffic to VPC-B
  • Private NAT translates source IP from 10.0.x.x to 100.64.0.5 (routable range)
  • VPC-B sees traffic from 100.64.0.5 — no overlap conflict
  • Response returns to 100.64.0.5, NAT Gateway reverses the translation

Key difference at a glance:

FeaturePublic NAT GatewayPrivate NAT Gateway
PurposeInternet accessVPC-to-VPC / on-prem
Elastic IP requiredYESNO
Subnet placementPublic subnetPrivate subnet
Connects toInternet GatewayTransit GW / VPN GW
Source IP visibleEIP (when via IGW)NAT GW private IP
Internet accessYESNO (IGW drops traffic)
Use caseSoftware updates, APIsOverlapping CIDRs,private connectivity

Critical rule: If you attach an Internet Gateway to a VPC that contains a private NAT gateway and route traffic from that private NAT gateway toward the IGW — the IGW drops the traffic. A private NAT gateway cannot be used for internet access under any configuration.


An Elastic IP (EIP) is a static, persistent public IPv4 address that you allocate to your AWS account. Unlike the auto-assigned public IPs that EC2 instances get (which change every restart), an EIP stays yours until you explicitly release it.

A public NAT Gateway must have an EIP. The EIP is the address that the internet sees for all outbound traffic from your private subnets.

Why NAT Gateway requires an EIP (not a regular public IP)

Section titled “Why NAT Gateway requires an EIP (not a regular public IP)”

Regular public IPs (auto-assigned to EC2 instances) are dynamic — they change when the instance restarts and they cannot be moved between resources. A NAT Gateway needs a stable, predictable public IP because:

  1. Outbound IP whitelisting — many external services (payment APIs, partner services, corporate firewalls) whitelist specific IP addresses. If your outbound IP changes, your connections break. By using an EIP on the NAT Gateway, your entire private subnet has one predictable outbound IP.
  2. Managed resource continuity — AWS manages the NAT Gateway. If AWS needs to perform maintenance or replace hardware, the NAT Gateway may be recreated internally. The EIP stays stable across these events because it is decoupled from the hardware.
  3. DNS and auditing — with a static EIP, you can create DNS records pointing to your outbound IP and correlate audit logs to a predictable source address.

What happens to the EIP when you delete a NAT Gateway

Section titled “What happens to the EIP when you delete a NAT Gateway”
Delete NAT Gateway
EIP is DISASSOCIATED from the NAT Gateway
(EIP is no longer attached to anything)
EIP REMAINS in your account — you keep paying for it
(EIP idle charges apply until you release it)
You must MANUALLY release the EIP if you no longer need it
(EC2 → Elastic IPs → Actions → Release)

Gotcha: Deleting a NAT Gateway does NOT release the EIP. The EIP stays allocated to your account and you are charged for it ($0.005/hour when idle). Always explicitly release EIPs you no longer need.

Each public NAT Gateway can have up to 8 IPv4 addresses (1 primary + 7 secondary EIPs). Why would you want multiple EIPs on one NAT Gateway?

This relates to the 55,000 concurrent connections limit (explained in Section 9). The limit is per IP address per unique destination. If your private instances are making massive numbers of connections to the same external service, you can hit this limit.

By adding more EIPs, the NAT Gateway can use different source IPs to distribute connections:

One EIP: 55,000 concurrent connections to one destination
Two EIPs: 110,000 concurrent connections to one destination
Four EIPs: 220,000 concurrent connections to one destination

The default quota is 2 EIPs per NAT Gateway — you can increase this via AWS Service Quotas.


6. Placement Rule — Why NAT Gateway Must Be in a Public Subnet

Section titled “6. Placement Rule — Why NAT Gateway Must Be in a Public Subnet”

This is one of the most frequently asked questions and the most common mistake beginners make.

The rule: A public NAT Gateway must be placed in a public subnet — a subnet whose route table has 0.0.0.0/0 → IGW.

Here is why, step by step:

INCORRECT SETUP (NAT Gateway in private subnet):
private-ec2 nat-gw (WRONG — in private-subnet)
(10.0.0.140) (10.0.0.130)
│ │
▼ ▼
private-subnet Route Table:
10.0.0.0/24 → local
0.0.0.0/0 → nat-gw ← private-ec2 sends internet traffic here
nat-gw wants to forward this to the internet.
nat-gw consults its own subnet's route table (private-subnet):
0.0.0.0/0 → ??? (no route to IGW — this is a private subnet!)
Result: nat-gw cannot reach the internet. Traffic is dropped.
CORRECT SETUP (NAT Gateway in public subnet):
private-ec2 nat-gw (CORRECT — in public-subnet)
(10.0.0.140) (10.0.0.5)
│ │
▼ ▼
private-subnet Route Table: public-subnet Route Table:
10.0.0.0/24 → local 10.0.0.0/24 → local
0.0.0.0/0 → nat-gw 0.0.0.0/0 → my-igw ← THIS is why!
Result: private-ec2 → nat-gw → my-igw → Internet. Works perfectly.

The NAT Gateway itself needs a path to the internet to forward the traffic it receives. It can only get that path through the route table of the subnet it lives in. A public subnet’s route table has 0.0.0.0/0 → IGW, giving the NAT Gateway the exit it needs.

Think of the NAT Gateway as a security guard at a building entrance. The guard (NAT Gateway) must be stationed at the door (public subnet — the one that faces outside). Putting the guard in an interior room (private subnet) means the guard can’t actually see or reach the outside world.


7. High Availability — One NAT Gateway Per AZ

Section titled “7. High Availability — One NAT Gateway Per AZ”

A NAT Gateway is confined to the AZ it is created in. It has internal redundancy within that AZ (AWS manages this), but if the entire AZ goes down, the NAT Gateway goes down with it.

Look at what happens with a single NAT Gateway setup when us-east-1a fails:

Single NAT Gateway AZ failure problem

Single NAT Gateway AZ failure problem

This defeats the purpose of multi-AZ architecture. The AZ failure should only affect resources in that AZ, not cascade to other AZs.

The correct HA setup — one NAT Gateway per AZ

Section titled “The correct HA setup — one NAT Gateway per AZ”

Correct HA setup: one NAT Gateway per AZ

Correct HA setup: one NAT Gateway per AZ

Each AZ’s private subnets use only the NAT Gateway in their own AZ. Failure is contained.

Even beyond HA, using a NAT Gateway in a different AZ from your instance costs extra money. AWS charges for cross-AZ data transfer. If private-ec2-2 (us-east-1b) routes through nat-gw (us-east-1a), every byte of traffic crosses an AZ boundary and incurs a transfer fee — on top of the NAT Gateway’s own data processing charge.

By routing each AZ’s instances to the NAT Gateway in the same AZ, you avoid cross-AZ transfer costs entirely.

COST COMPARISON (when scaling to multiple AZs):
Single NAT Gateway in us-east-1a:
private-ec2 → nat-gw: same AZ, no cross-AZ charge
private-ec2-2 → nat-gw: cross-AZ, extra charge per GB
One NAT Gateway per AZ:
private-ec2 → nat-gw: same AZ, no cross-AZ charge
private-ec2-2 → nat-gw-2: same AZ, no cross-AZ charge

The cost of running a second NAT Gateway (hourly fee) is often offset by eliminating cross-AZ data transfer fees for production workloads with significant traffic.

Our example architecture — single-AZ setup

Section titled “Our example architecture — single-AZ setup”

Our running example uses a single AZ (us-east-1a) with one NAT Gateway — the simplest correct setup.

Part 3 architecture: my-vpc with NAT Gateway

Part 3 architecture: my-vpc with NAT Gateway

When you expand to multiple AZs, apply the one-per-AZ pattern shown in the conceptual diagrams above: each AZ gets its own NAT Gateway in its own public subnet, and each AZ’s private route table points only to the NAT Gateway in the same AZ.


Before NAT Gateway existed, AWS customers had to run a NAT Instance — an EC2 instance manually configured to perform NAT. NAT Gateway replaced this pattern for most use cases, but understanding NAT Instance helps clarify exactly what NAT Gateway abstracts away.

A NAT Instance is just an EC2 instance (typically running Amazon Linux) with two things configured:

  1. Source/destination check disabled — by default, EC2 instances drop any packet where the instance is not the actual source or destination. For a NAT instance to forward packets on behalf of other instances, this check must be disabled.
  2. IP forwarding and iptables rules — the OS is configured to forward packets and apply NAT rules using iptables.

You are responsible for all of this: the EC2 instance, the OS configuration, security patching, scaling if traffic grows, and replacing the instance if it fails.

FeatureNAT GatewayNAT Instance
ManagementFully AWS-managedYou manage the EC2 instance
AvailabilityHA within the AZ (AWS handles)Single instance — you must build your own HA with ASG
Bandwidth5 Gbps → auto-scales to 100 GbpsDepends entirely on instance type (t3.small vs c5.4xlarge)
ScalingAutomaticManual (change instance type)
Security GroupsCannot attachCan attach
CostHourly + per-GB chargesEC2 instance + data transfer
PatchingAWS handlesYou handle OS patches
Setup complexityLow (console, a few clicks)Higher (disable src/dst check,configure iptables)
IP fragmentation TCPNot supportedSupported
Port forwardingNot supportedSupported (via iptables rules)
Use as bastion hostNot possiblePossible (SSH through it)
Custom routing logicNot possiblePossible
Logging / metricsCloudWatch metrics built-inYou set up custom logging

NAT Gateway handles nearly every production scenario better than NAT Instance. The narrow cases where you might still reach for NAT Instance:

  • You need TCP IP fragmentation support (NAT Gateway drops fragmented TCP)
  • You need port forwarding (redirecting traffic on specific ports to different internal destinations)
  • You are running a very small workload and want to use the NAT instance as a bastion host too (two birds, one stone)
  • Hard budget constraint and the NAT Gateway hourly cost is genuinely prohibitive for a lab/sandbox environment

For anything production-grade, use NAT Gateway.


These are the official AWS limits for a NAT Gateway. Understanding them matters when designing architectures at scale.

Baseline bandwidth: 5 Gbps
Maximum bandwidth: 100 Gbps (auto-scales, no action required)
Baseline packets/sec: 1,000,000 (1 million pps)
Maximum packets/sec: 10,000,000 (10 million pps — packets dropped beyond this)

You do not need to provision or configure this scaling — it happens automatically. If your workload needs more than 5 Gbps, the NAT Gateway scales up transparently.

55,000 simultaneous connections per unique destination per IP address

A “unique destination” means a unique combination of: destination IP + destination port + protocol.

In practice, this means:

nat-gw (IP: 54.10.10.10) can hold:
55,000 concurrent TCP connections to 142.250.80.46:443 (Google HTTPS)
55,000 concurrent TCP connections to 1.1.1.1:443 (Cloudflare HTTPS)
55,000 concurrent UDP connections to 8.8.8.8:53 (Google DNS)
... and so on for every unique destination

Each unique destination gets its own pool of 55,000. The limit is not a global cap across all connections.

When do you hit this limit?

Mostly when many instances are connecting to the same destination IP and port — for example, hundreds of instances all hitting the same API endpoint simultaneously.

How to increase it:

Add secondary EIPs to the NAT Gateway. Each additional IP address adds another 55,000 connections per unique destination.

1 EIP: 55,000 connections per unique destination
2 EIPs: 110,000 connections per unique destination
4 EIPs: 220,000 connections per unique destination
8 EIPs: 440,000 connections per unique destination (maximum)
Ephemeral port range used by NAT Gateway: 1024 – 65535

This matters for NACL configuration. When a private instance sends a request through NAT Gateway, the NAT Gateway picks a source port from this range. Return traffic from the destination will come back to NAT Gateway on one of these ports. Your NACLs must allow this range.

NAT Gateway max MTU: 8500 bytes (supports jumbo frames internally)
Internet-bound MTU: Must be 1500 bytes or less on EC2 instances
Why 1500 for internet: The public internet does not support jumbo frames.
Even though AWS's internal network supports 8500 bytes,
once traffic exits to the internet it must be 1500 bytes.

NAT Gateway supports Path MTU Discovery (PMTUD) — if a remote server sends back an ICMP “Fragmentation Needed” message, NAT Gateway propagates this signal, allowing the connection to negotiate a smaller packet size automatically.


NAT Gateway is stateful in one specific sense: it tracks active translations so it can match responses back to the correct private instance. This is different from a full stateful firewall — NAT Gateway does not inspect or block traffic the way a Security Group does. It just maintains the translation table.

NAT Gateway has an idle connection timeout of 350 seconds. If a connection has no data flowing for 350 seconds, NAT Gateway removes it from its translation table.

Connection established: 10.0.0.140:52341 ←→ 54.10.10.10:61200 ←→ 8.8.8.8:443
Data flowing: translation table entry stays active
No data for 350 seconds: NAT Gateway removes the entry from translation table
What happens next:
If 8.8.8.8 sends data after 350 seconds:
NAT Gateway has no record of this connection
NAT Gateway sends RST (reset) to the client (10.0.0.140)
RST = abrupt connection close, NOT graceful FIN handshake

RST vs FIN — why this matters for applications

Section titled “RST vs FIN — why this matters for applications”

When a connection ends normally, the parties exchange FIN packets — “I’m done sending, you can close.” This is graceful. Most applications handle FIN cleanly.

When NAT Gateway forcibly removes an idle connection, it sends RST — “this connection no longer exists.” RST is abrupt. Applications that do not handle RST gracefully may crash, throw unhandled exceptions, or not reconnect. This is a common source of mysterious intermittent failures.

Application behavior that causes idle timeout issues:
- Connection pools that keep connections open "forever" between requests
- Long-polling HTTP connections
- WebSocket connections with no heartbeat
- Database connection pools with idle connections
Fix: Enable TCP keepalive on the application or OS with interval < 350 seconds
Linux:
sudo sysctl -w net.ipv4.tcp_keepalive_time=300
sudo sysctl -w net.ipv4.tcp_keepalive_intvl=60
sudo sysctl -w net.ipv4.tcp_keepalive_probes=3
This sends keepalive probes every 300 seconds of idle time,
keeping the NAT Gateway translation entry alive.
MetricWhat it tells you
ErrorPortAllocation> 0 means NAT Gateway ran out of ports — add more EIPs
IdleTimeoutCountConnections timing out from inactivity — fix with keepalive
PacketsDropCountPackets being dropped — possible port exhaustion or AZ issue
ActiveConnectionCountCurrent active connections — monitor for growth trends
ConnectionAttemptCountNew connection attempts per minute

Monitor ErrorPortAllocation and PacketsDropCount in CloudWatch. If either is non-zero, investigate before it becomes a user-facing problem.


11. NAT Gateway vs Internet Gateway — The Complete Comparison

Section titled “11. NAT Gateway vs Internet Gateway — The Complete Comparison”

These two often get confused because both deal with internet connectivity. They serve completely different purposes and work at different levels.

FeatureInternet Gateway (IGW)NAT Gateway
What it isVPC attachment point for internet connectivityA proxy/forwarder for private instances
Where it livesAttached to the VPC (not in any subnet)Inside a subnet (public)
Traffic directionBidirectional (inbound + outbound)Outbound only (responses allowed back,no unsolicited inbound)
Instance needs public IP?YES — instance must have a public IP for inboundNO — NAT GW has the EIP on behalf of instances
Internet can initiate?YES (if Security Group allows)NO (never)
Handles NAT?YES — translates instance private IP ↔︎ public IPYES — translates private IP to its own EIP
Where NAT happensAt the IGW (instance’s public IP ↔︎ instance’s private IP)At the NAT Gateway (instance’s private IP ↔︎ NAT GW’s EIP)
Used byPublic subnetsPrivate subnets
One-per-VPC limit?YES — one IGW per VPCNO — many NAT GWs per VPC
RedundancyAWS manages, always HA across ALL AZsHA within AZ only (create one per AZ)
CostFree (pay for data transfer)Hourly + per-GB data charge

The flow for public vs private subnet — side by side

Section titled “The flow for public vs private subnet — side by side”
PUBLIC SUBNET INSTANCE (has public IP, uses IGW):
EC2 (10.0.0.10 / public: 52.x.x.x)
→ Sends packet src: 10.0.0.10 dst: 8.8.8.8
→ Route table: 0.0.0.0/0 → my-igw
→ IGW translates: 10.0.0.10 → 52.x.x.x
→ Internet sees: src: 52.x.x.x dst: 8.8.8.8
← Response: dst: 52.x.x.x
← IGW translates: 52.x.x.x → 10.0.0.10
← EC2 receives: dst: 10.0.0.10
PRIVATE SUBNET INSTANCE (no public IP, uses NAT Gateway):
EC2 (10.0.0.140, no public IP)
→ Sends packet src: 10.0.0.140 dst: 8.8.8.8
→ Route table: 0.0.0.0/0 → nat-gw
→ NAT GW translates: 10.0.0.140:52341 → 54.10.10.10:61200
→ NAT GW route: 0.0.0.0/0 → my-igw
→ IGW sees NAT GW's EIP — forwards to internet
→ Internet sees: src: 54.10.10.10 dst: 8.8.8.8
← Response: dst: 54.10.10.10
← NAT GW translates: 54.10.10.10:61200 → 10.0.0.140:52341
← EC2 receives: dst: 10.0.0.140

The key insight: NAT Gateway and IGW work together for private subnet internet access. NAT Gateway handles the private-to-EIP translation, and IGW handles the EIP-to-internet routing. Neither one alone is sufficient.


12. When NOT to Use NAT Gateway — VPC Endpoints

Section titled “12. When NOT to Use NAT Gateway — VPC Endpoints”

NAT Gateway is charged per hour plus per GB of data processed. Every byte of traffic going through NAT Gateway costs money. For traffic destined to AWS services (S3, DynamoDB, CloudWatch, SSM, etc.), routing through NAT Gateway is wasteful because you are:

  1. Paying NAT Gateway data processing charges
  2. Sending traffic out of your VPC and back in again unnecessarily
  3. Potentially sending traffic over the internet (less secure)

VPC Endpoints let private instances communicate with AWS services without leaving the AWS network — no NAT Gateway, no internet, no IGW.

Gateway Endpoints (free)

Available for: Amazon S3 and Amazon DynamoDB only.

A Gateway Endpoint adds a route to your route table pointing directly to the S3 or DynamoDB service within AWS’s network. Traffic never leaves AWS.

WITHOUT Gateway Endpoint (expensive):
private-ec2 → NAT GW → IGW → Internet → S3
Cost: NAT GW hourly + NAT GW data charge + data transfer
WITH Gateway Endpoint (free):
private-ec2 → S3 Gateway Endpoint → S3 (stays inside AWS network)
Cost: $0 for the endpoint itself, only standard S3 request costs
Setup:
VPC → Endpoints → Create Endpoint
Service: com.amazonaws.us-east-1.s3
Type: Gateway
VPC: my-vpc
Route Tables: select private-rt
This automatically adds a route:
Destination: pl-xxxxxxxx (S3's managed prefix list)
Target: vpce-xxxxxxxxx
private-ec2 traffic to S3 now uses this route instead of 0.0.0.0/0 → NAT GW.

Interface Endpoints (paid, but often cheaper than NAT Gateway for high volume)

Available for: most other AWS services (EC2 API, CloudWatch, SSM, SQS, SNS, Secrets Manager, KMS, etc.)

An Interface Endpoint creates an ENI in your subnet with a private IP. Traffic to AWS services is routed to this private IP. No NAT Gateway involved.

Cost comparison for SSM:
WITHOUT Interface Endpoint:
Every SSM call goes: private-ec2 → NAT GW → Internet → SSM endpoint
Charge: NAT GW hourly ($0.045/hr) + data processing ($0.045/GB)
WITH Interface Endpoint:
Every SSM call goes: private-ec2 → Interface Endpoint ENI → SSM
Charge: Interface Endpoint hourly ($0.01/hr per AZ) — but zero data processing
Traffic destinationRecommendation
S3 or DynamoDBUse Gateway Endpoint (free)
Other AWS servicesConsider Interface Endpoint(evaluate: volume of traffic to that service vs Interface Endpoint hourly cost)
Third-party internet (APIs, package repos, etc.)NAT Gateway is required (no alternative)

For instances that only need to call AWS services (never third-party internet), you may be able to eliminate NAT Gateway entirely by using VPC Endpoints — especially for EC2 instances in private subnets that use SSM Session Manager instead of SSH.


NAT Gateway has two cost components:

ComponentCharge
NAT Gateway hourlyPer hour the NAT Gateway is provisioned(charged from creation to deletion, regardless of traffic)
Data processingPer GB of data flowing THROUGH the NAT Gateway(charged for both inbound and outbound data)

Exact prices vary by region and change over time. Always check the Amazon VPC Pricing page for current rates. The patterns below apply regardless of the specific numbers.

1. Place resources in the same AZ as the NAT Gateway

Cross-AZ data transfer has its own charge on top of NAT Gateway processing. If private-ec2-2 in us-east-1b routes through nat-gw in us-east-1a, you pay:

NAT GW data processing charge (for the data going through nat-gw)
+ Cross-AZ transfer charge (for data crossing AZ boundary to reach nat-gw)
+ Cross-AZ transfer charge (for the response coming back)

With one NAT Gateway per AZ, you eliminate the cross-AZ charges.

2. Use VPC Endpoints for AWS services

As covered in Section 12 — S3, DynamoDB, and other AWS services do not need to go through NAT Gateway.

3. Delete NAT Gateways in non-production environments when not in use

NAT Gateway charges hourly from creation to deletion. In dev/test environments that are only active during business hours, deleting the NAT Gateway at the end of each day and recreating it in the morning eliminates overnight charges.

Automatable with AWS Lambda + EventBridge:
8:00 AM → Lambda creates NAT Gateway, updates route table
7:00 PM → Lambda deletes NAT Gateway

4. Do not confuse idle with free

Even with zero traffic flowing through it, a NAT Gateway sitting there with Available status is charging you hourly. If you are not using it, delete it.


14. NACLs and NAT Gateway — the Ephemeral Port Problem

Section titled “14. NACLs and NAT Gateway — the Ephemeral Port Problem”

This is the most common reason NAT Gateway stops working after it seems to be set up correctly. It is a subnet NACL misconfiguration.

When private-ec2 (10.0.0.140) makes an outbound TCP connection to 8.8.8.8:443:

  • The destination port is 443 (HTTPS) — fixed by the service
  • The source port is chosen randomly by the OS from a range called ephemeral ports

On Linux: 32768 – 60999 On Windows: 1024 – 65535

NAT Gateway uses ports 1024 – 65535 for its translated source ports.

When the response comes back from 8.8.8.8, it is addressed to the NAT Gateway on one of these ephemeral ports. If your NACLs don’t allow these ports, the response is dropped and the connection fails.

Both the private subnet NACL and the public subnet NACL (where the NAT Gateway lives) need to be configured correctly.

Traffic flow for private-ec2 → NAT GW → Internet:
STEP 1: private-ec2 → nat-gw
Traffic crosses: private-subnet NACL (outbound)
Then: public-subnet NACL (inbound at nat-gw)
STEP 2: nat-gw → Internet (via IGW)
Traffic crosses: public-subnet NACL (outbound)
STEP 3: Internet → nat-gw (response)
Traffic crosses: public-subnet NACL (inbound)
Response port is an ephemeral port (1024-65535) ← must allow this
STEP 4: nat-gw → private-ec2 (translated response)
Traffic crosses: public-subnet NACL (outbound)
Then: private-subnet NACL (inbound)
Response port is an ephemeral port (1024-65535) ← must allow this

private-subnet NACL — INBOUND:

Rule#TypeProtocolPort RangeSourceAction
100Custom TCPTCP1024-655350.0.0.0/0ALLOW ← return traffic
*AllAllAll0.0.0.0/0DENY

private-subnet NACL — OUTBOUND:

Rule#TypeProtocolPort RangeDestinationAction
100HTTPSTCP4430.0.0.0/0ALLOW
110HTTPTCP800.0.0.0/0ALLOW
120Custom UDPUDP530.0.0.0/0ALLOW ← DNS
*AllAllAll0.0.0.0/0DENY

public-subnet NACL — INBOUND:

Rule#TypeProtocolPort RangeSourceAction
100Custom TCPTCP1024-655350.0.0.0/0ALLOW ← return traffic
*AllAllAll0.0.0.0/0DENY

public-subnet NACL — OUTBOUND:

Rule#TypeProtocolPort RangeDestinationAction
100HTTPSTCP4430.0.0.0/0ALLOW
110HTTPTCP800.0.0.0/0ALLOW
120Custom UDPUDP530.0.0.0/0ALLOW
*AllAllAll0.0.0.0/0DENY

Security Groups do NOT have this problem because Security Groups are stateful — they automatically allow response traffic. NACLs are stateless — they independently evaluate every packet in both directions, so you must explicitly allow ephemeral port responses.

Why Security Groups are not the problem here: You cannot attach a Security Group to a NAT Gateway. Security Groups only apply to EC2 instances and a few other resources. The NACL on the subnet is the only firewall mechanism at the NAT Gateway level.


Symptom: Private instances have no internet access even with NAT Gateway configured. Cause: NAT Gateway was created in a private subnet that has no route to the IGW. Fix: Recreate the NAT Gateway in a public subnet (one with 0.0.0.0/0 → IGW).

2. Route table not updated after creating NAT Gateway

Section titled “2. Route table not updated after creating NAT Gateway”

Symptom: Private instances still can’t reach internet after NAT Gateway is Available. Cause: The private subnet’s route table still has no 0.0.0.0/0 route, or it points to the old route. Fix: Add 0.0.0.0/0 → nat-gw-xxxx to the private route table.

3. EIP not released after NAT Gateway deletion

Section titled “3. EIP not released after NAT Gateway deletion”

Symptom: Unexpected EIP charges after deleting the NAT Gateway. Cause: AWS disassociates the EIP but does not release it. It stays in your account billing as idle EIP. Fix: Go to EC2 → Elastic IPs → Release address.

4. Stale routes pointing to deleted NAT Gateway

Section titled “4. Stale routes pointing to deleted NAT Gateway”

Symptom: Routes in route table show blackhole status. Cause: NAT Gateway was deleted but the route entry 0.0.0.0/0 → nat-gw-xxxx still exists. Fix: Delete or update the blackhole route entry.

5. Cannot ping private instance from another VPC via NAT Gateway

Section titled “5. Cannot ping private instance from another VPC via NAT Gateway”

Symptom: Trying to SSH or ping from a peered VPC through a NAT Gateway fails. Cause: You cannot route traffic through a NAT Gateway via VPC Peering. Traffic must originate from inside the NAT Gateway’s own VPC. The path “VPC-B → Peering → NAT GW in VPC-A → Internet” is not supported. Fix: Give the connecting resource its own internet access, or restructure the architecture.

6. 350-second RST kills long-lived connections

Section titled “6. 350-second RST kills long-lived connections”

Symptom: Database connections, WebSocket connections, or long-running requests intermittently fail after minutes of inactivity. Cause: NAT Gateway idle timeout (350 seconds) removed the connection from its translation table. When the remote server sends data, NAT Gateway RSTs the connection. Fix: Enable TCP keepalive on the application or OS with interval less than 350 seconds.

7. Private NAT Gateway routing to Internet Gateway

Section titled “7. Private NAT Gateway routing to Internet Gateway”

Symptom: Instances can’t reach the internet even though there’s both a private NAT Gateway and an IGW in the same VPC. Cause: Private NAT Gateways are not designed for internet access. The IGW drops traffic from a private NAT Gateway. Fix: Use a public NAT Gateway for internet access.

8. NACLs blocking ephemeral port return traffic

Section titled “8. NACLs blocking ephemeral port return traffic”

Symptom: Connections from private instances time out or fail, but Security Group rules look correct. Cause: NACLs on the private or public subnet are not allowing inbound TCP ports 1024–65535. Fix: Add NACL inbound rule allowing TCP 1024–65535 from 0.0.0.0/0 on both the private and public subnets.

9. Single NAT Gateway causes AZ-level failure cascade

Section titled “9. Single NAT Gateway causes AZ-level failure cascade”

Symptom: When one AZ fails, private instances in ALL AZs lose internet access. Cause: All private subnets route through a single NAT Gateway that is in the failing AZ. Fix: Create one NAT Gateway per AZ, with each AZ’s private route table pointing to the NAT Gateway in the same AZ.

10. Paying for cross-AZ data transfer unnecessarily

Section titled “10. Paying for cross-AZ data transfer unnecessarily”

Symptom: Higher-than-expected data transfer charges. Cause: Private instances in us-east-1b are routing through a NAT Gateway in us-east-1a (cross-AZ). Fix: One NAT Gateway per AZ, route tables pointing to same-AZ NAT Gateway.


TypeSubnetEIP?Connects toSource IP seen
Public NAT GWPublicYESInternet (via IGW)Other VPCs (via TGW)EIPNAT GW private IP
Private NAT GWPrivateNOOther VPCs (via TGW)On-premises (via VGW)NAT GW private IPNAT GW private IP
Bandwidth: 5 Gbps baseline → auto-scales to 100 Gbps
Packets/sec: 1M pps baseline → auto-scales to 10M pps
Concurrent conns: 55,000 per unique destination per IP address
Max EIPs: 8 (1 primary + 7 secondary)
Idle timeout: 350 seconds (sends RST, not FIN)
MTU: 8500 bytes internal, 1500 bytes for internet-bound
Ephemeral ports: 1024–65535
Does your private instance need internet access?
YES → Use Public NAT Gateway
│ (in public subnet, with EIP, with route to IGW)
NO → Is it connecting to another VPC with overlapping CIDRs?
YES → Use Private NAT Gateway
│ (in private subnet, no EIP, route via TGW)
NO → Is it connecting to AWS services (S3, DynamoDB)?
YES → Use VPC Gateway Endpoint (free)
NO → Is it connecting to other AWS services (SSM, CW)?
YES → Consider Interface Endpoint
NO → Use VPC Peering or Transit Gateway
PUBLIC SUBNET (for NAT Gateway to use):
Destination Target
10.0.0.0/24 local
0.0.0.0/0 my-igw
PRIVATE SUBNET (instances using NAT Gateway):
Destination Target
10.0.0.0/24 local
0.0.0.0/0 nat-gw ← always same AZ for HA + cost
PRIVATE SUBNET (with S3 Gateway Endpoint added):
Destination Target
10.0.0.0/24 local
pl-xxxxxxxx vpce-xxxxxxxxx ← S3 traffic bypasses NAT GW
0.0.0.0/0 nat-gw
□ One NAT Gateway per AZ that has private workloads
□ Each AZ's private route table points to NAT GW in the SAME AZ
□ Each NAT Gateway has its own Elastic IP
□ Each NAT Gateway is in a PUBLIC subnet
□ Public subnet route table has 0.0.0.0/0 → IGW
□ NACLs allow inbound TCP 1024-65535 on both private and public subnets
□ TCP keepalive enabled on instances with long-lived connections (< 350s interval)
□ CloudWatch alarms on ErrorPortAllocation and PacketsDropCount
□ EIP released manually after NAT Gateway deletion
Charged for: Every hour NAT Gateway exists (Available, Pending, or Deleting)
Charged for: Every GB processed through NAT Gateway
Not charged: For the EIP while it is attached to a running NAT Gateway
CHARGED FOR: EIP when it is idle (disassociated) — release it immediately on NAT GW deletion
Savings: Same-AZ routing eliminates cross-AZ transfer charges
Savings: Gateway Endpoints for S3/DynamoDB eliminate NAT data processing charges

End of Part 3 — NAT Gateway Deep Dive

Next: Part 4 — VPC Peering and Transit Gateway