You have workloads in AWS. You have workloads in Azure. They need to talk to each other over private IPs, securely, over encrypted tunnels across the public internet. The solution to this is a site-to-site VPN, and while both clouds support it natively, getting them to agree on the details takes some understanding of how each side works.
In this blog post I’ll walk through the concepts, the differences between AWS and Azure VPN implementations, and show you a working Terraform setup that provisions both sides in a single apply.
The Building Blocks
A site-to-site VPN has three layers, and understanding them separately first is important before you can understand them together.
Layer 1: IPsec – The Encrypted Pipe
IPsec creates an encrypted tunnel between two public IP addresses. It negotiates in two phases:
- Phase 1 (IKE) establishes a secure control channel. The two sides authenticate using a pre-shared key, agree on encryption parameters (AES-256, SHA-256, etc.), and set up a session that rekeys every 8 hours.
- Phase 2 (IPsec SA) creates the actual data tunnel inside that control channel. This is where your traffic flows, encrypted. It rekeys every hour for forward secrecy — even if a key leaks, only one hour of traffic is exposed.
The result is an encrypted pipe between two public IPs. But a pipe alone isn’t useful. For that you’ll also need routing.
Layer 2: BGP – The Routing Protocol
IPsec doesn’t know anything about your internal networks. It just encrypts and forwards. BGP (Border Gateway Protocol) runs inside the IPsec tunnel and handles route exchange:
- Azure tells AWS: “I have
10.224.0.0/12, send traffic for it to me” - AWS tells Azure: “I have
172.31.0.0/16, send traffic for it to me” - If a tunnel goes down, BGP detects it within seconds and reroutes
Each BGP speaker needs an ASN (Autonomous System Number), basically a unique identifier. AWS defaults to 64512, Azure defaults to 65515. These are from the private range (64512-65534), analogous to how 192.168.x.x is a private IP range.
Layer 3: Link-Local Addresses – The Inside of the Tunnel
BGP peers need IP addresses to talk to each other. But which IPs? Your real subnets (172.31.x.x, 10.224.x.x) haven’t been routed yet — that’s what BGP is trying to set up. It’s a chicken-and-egg problem.
The solution is link-local addresses from the 169.254.0.0/16 range. These are special IPs that only exist on a single point-to-point link — in this case, inside the IPsec tunnel. Each tunnel gets a /30 subnet (4 IPs, 2 usable):
┌────────── IPsec Tunnel ──────────┐
Azure ── 169.254.21.2 <──BGP──> 169.254.21.1 ── AWS
└──────────────────────────────────┘
(169.254.21.0/30)
Within each /30, AWS takes host 1 and the customer (Azure) takes host 2. Azure calls these “APIPA addresses” (Automatic Private IP Addressing); AWS calls them “inside CIDRs.” Both mean the same: a temporary IP that exists only inside the tunnel for BGP to use.
How AWS and Azure Differ
Both clouds support IPsec + BGP, but they handle the details differently:
| Aspect | AWS | Azure |
|---|---|---|
| Gateway | Virtual Private Gateway (VGW) — no public IPs of its own | Virtual Network Gateway — has dedicated public IPs |
| Remote peer representation | Customer Gateway (CGW) — metadata only | Local Network Gateway — metadata only |
| Tunnels per connection | 2 endpoints per VPN connection (always) | 1 per connection, pin to a gateway instance |
| Inside tunnel IPs | Auto-assigned or specified, must be 169.254.x.x/30 | Manually specified, must be 169.254.x.x (APIPA) |
| Pre-shared keys | Generated by AWS | You provide them (or auto-generate) |
| Provisioning time | ~5 minutes | ~30-45 minutes |
| Route propagation | Explicit per-route-table setting | Automatic within the VNet |
Key Difference: Tunnel Counts
AWS always creates two endpoints per VPN connection, each with its own public IP in a different AZ. The VGW itself has no public IPs. The tunnel endpoint IPs are allocated when the VPN connection is created.
Azure creates one IPsec connection per connection resource, targeting a specific gateway instance. In active-standby mode (one instance), all connections share a single point of failure. In active-active mode (two instances, two public IPs), connections can be distributed across instances.
Why Active-Active Matters
An Azure gateway instance can have multiple APIPA addresses in its list, but without custom_bgp_addresses on each connection, Azure just picks the first address for all connections, regardless of which tunnel it’s peering with. The custom_bgp_addresses block that pins a specific APIPA to a specific connection requires both a primary and secondary field, mapping to ip-config-1 and ip-config-2 respectively. Active-standby only has one ip-config, so custom_bgp_addresses doesn’t work. You must use active-active mode. The cost difference is negligible — just one extra static public IP (~140/month VpnGw1 gateway).
With active-active, the full picture for an AWS-to-Azure VPN is:
- 2 Azure public IPs (one per instance)
- 2 AWS Customer Gateways (one per Azure IP, since a CGW only accepts a single IP)
- 2 AWS VPN Connections (one per CGW, each producing 2 endpoints = 4 total)
- 4 Azure connection resources (one per AWS endpoint)
Azure Instance 0 (PIP 1) Azure Instance 1 (PIP 2)
| |
AWS CGW 0 AWS CGW 1
| |
AWS VPN Connection 0 AWS VPN Connection 1
/ \ / \
Tunnel 1 Tunnel 2 Tunnel 1 Tunnel 2
| | | |
vpn_aws_ vpn_aws_ vpn_aws_ vpn_aws_
i0_t1 i0_t2 i1_t1 i1_t2
If Azure instance 0 goes down, instance 1 and its tunnels stay up. If an AWS endpoint goes down, the other endpoint for that CGW takes over. Full redundancy on both sides.
APIPA Address Allocation
Each tunnel needs a unique /30 from the 169.254.0.0/16 range. AWS takes host 1, Azure takes host 2:
| Tunnel | Inside CIDR | AWS (host 1) | Azure (host 2) |
|---|---|---|---|
| CGW0 Tunnel 1 -> Instance 0 | 169.254.21.0/30 | 169.254.21.1 | 169.254.21.2 |
| CGW0 Tunnel 2 -> Instance 0 | 169.254.22.0/30 | 169.254.22.1 | 169.254.22.2 |
| CGW1 Tunnel 1 -> Instance 1 | 169.254.21.4/30 | 169.254.21.5 | 169.254.21.6 |
| CGW1 Tunnel 2 -> Instance 1 | 169.254.22.4/30 | 169.254.22.5 | 169.254.22.6 |
All of these are derived from a single Terraform variable — a list of lists where the first list contains CIDRs for instance 0 and the second for instance 1:
variable "aws_tunnel_inside_cidrs" {
type = list(list(string))
default = [
["169.254.21.0/30", "169.254.22.0/30"], # Azure instance 0
["169.254.21.4/30", "169.254.22.4/30"], # Azure instance 1
]
}
The AWS module consumes these CIDRs directly. The Azure gateway derives its APIPA addresses using cidrhost(cidr, 2), which picks host 2 from each /30. One variable drives both sides — no duplication, no drift.
The Terraform Implementation
Module Structure
We split the VPN into reusable modules:
modules/
├── azure/vpn/
│ ├── gateway/ # Azure VPN Gateway (subnet, 2 public IPs, gateway)
│ └── connection/ # Generic: Local Network Gateway + VPN Connection
└── aws/vpn/
└── gateway/ # 2 Customer Gateways + 2 VPN Connections (4 tunnels)
The Azure connection module is generic. It takes a remote peer’s public IP, BGP ASN, APIPA address, and custom BGP addresses, then creates the Azure-side connection. It’s used once per tunnel, regardless of whether the remote peer is AWS, an office firewall, or another cloud.
The Azure VPN Gateway
The gateway is always provisioned with active-active and two public IPs:
resource "azurerm_subnet" "gateway" {
name = "GatewaySubnet" # Azure requires this exact name
resource_group_name = var.vnet_resource_group_name
virtual_network_name = var.vnet_name
address_prefixes = [var.gateway_subnet_prefix] # minimum /27
}
resource "azurerm_public_ip" "pip1" {
name = "${var.base_resource_name_h}-vpngw-pip-1"
location = var.location
resource_group_name = var.resource_group_name
allocation_method = "Static"
sku = "Standard"
}
resource "azurerm_public_ip" "pip2" {
name = "${var.base_resource_name_h}-vpngw-pip-2"
location = var.location
resource_group_name = var.resource_group_name
allocation_method = "Static"
sku = "Standard"
}
resource "azurerm_virtual_network_gateway" "this" {
name = "${var.base_resource_name_h}-vpngw"
location = var.location
resource_group_name = var.resource_group_name
type = "Vpn"
vpn_type = "RouteBased"
sku = "VpnGw1"
active_active = true
bgp_enabled = true
bgp_settings {
asn = 65515
peering_addresses {
ip_configuration_name = "vpn-ip-config-1"
apipa_addresses = var.azure_apipa_addresses_1 # instance 0 APIPA addresses
}
peering_addresses {
ip_configuration_name = "vpn-ip-config-2"
apipa_addresses = var.azure_apipa_addresses_2 # instance 1 APIPA addresses
}
}
ip_configuration {
name = "vpn-ip-config-1"
public_ip_address_id = azurerm_public_ip.pip1.id
private_ip_address_allocation = "Dynamic"
subnet_id = azurerm_subnet.gateway.id
}
ip_configuration {
name = "vpn-ip-config-2"
public_ip_address_id = azurerm_public_ip.pip2.id
private_ip_address_allocation = "Dynamic"
subnet_id = azurerm_subnet.gateway.id
}
}
Each ip_configuration maps to a gateway instance. Each instance gets its own APIPA addresses. Note: this resource takes 30-45 minutes to provision.
The APIPA addresses are derived from the AWS tunnel CIDRs using cidrhost:
locals {
azure_apipa_instance_0 = [for cidr in var.aws_tunnel_inside_cidrs[0] : cidrhost(cidr, 2)]
azure_apipa_instance_1 = [for cidr in var.aws_tunnel_inside_cidrs[1] : cidrhost(cidr, 2)]
}
cidrhost(cidr, 2) picks host 2 from each /30 — the Azure side of the tunnel.
The AWS Side
The existing VGW is looked up via a data source. We create two Customer Gateways (one per Azure public IP) and two VPN Connections:
data "aws_vpn_gateway" "this" {
id = var.aws_vgw_id
}
resource "aws_customer_gateway" "instance_0" {
bgp_asn = var.azure_bgp_asn # 65515
ip_address = var.azure_gateway_public_ip_1 # Azure PIP 1
type = "ipsec.1"
}
resource "aws_customer_gateway" "instance_1" {
bgp_asn = var.azure_bgp_asn # 65515
ip_address = var.azure_gateway_public_ip_2 # Azure PIP 2
type = "ipsec.1"
}
resource "aws_vpn_connection" "instance_0" {
vpn_gateway_id = data.aws_vpn_gateway.this.id
customer_gateway_id = aws_customer_gateway.instance_0.id
type = "ipsec.1"
tunnel1_inside_cidr = var.tunnel_inside_cidrs_instance_0[0] # 169.254.21.0/30
tunnel2_inside_cidr = var.tunnel_inside_cidrs_instance_0[1] # 169.254.22.0/30
}
resource "aws_vpn_connection" "instance_1" {
vpn_gateway_id = data.aws_vpn_gateway.this.id
customer_gateway_id = aws_customer_gateway.instance_1.id
type = "ipsec.1"
tunnel1_inside_cidr = var.tunnel_inside_cidrs_instance_1[0] # 169.254.21.4/30
tunnel2_inside_cidr = var.tunnel_inside_cidrs_instance_1[1] # 169.254.22.4/30
}
We specify the inside CIDRs explicitly so they match the APIPA addresses already configured on the Azure gateway. If you omit them, AWS auto-assigns random /30s and you’d have to update Azure to match.
Each VPN Connection outputs tunnel public IPs, BGP peering addresses, and pre-shared keys — 4 sets of outputs total.
Wiring the Two Sides Together
This is where Terraform shines. The AWS module outputs feed directly into the Azure connection modules — no manual copying of IPs or keys:
module "vpn_gateway" {
source = "../../modules/azure/vpn/gateway"
# ... creates active-active Azure VPN Gateway with 2 public IPs
azure_apipa_addresses_1 = local.azure_apipa_instance_0
azure_apipa_addresses_2 = local.azure_apipa_instance_1
}
module "aws_vpn_gateway" {
source = "../../modules/aws/vpn/gateway"
aws_vgw_id = var.aws_vgw_id
azure_gateway_public_ip_1 = module.vpn_gateway.gateway_public_ip_1
azure_gateway_public_ip_2 = module.vpn_gateway.gateway_public_ip_2
azure_bgp_asn = module.vpn_gateway.gateway_bgp_asn
tunnel_inside_cidrs_instance_0 = var.aws_tunnel_inside_cidrs[0]
tunnel_inside_cidrs_instance_1 = var.aws_tunnel_inside_cidrs[1]
}
# 4 Azure connections — one per AWS tunnel endpoint
module "vpn_aws_i0_t1" {
source = "../../modules/azure/vpn/connection"
connection_name = "aws-i0-t1"
gateway_id = module.vpn_gateway.gateway_id
remote_gateway_address = module.aws_vpn_gateway[0].i0_tunnel1_address
remote_bgp_asn = module.aws_vpn_gateway[0].vgw_bgp_asn
remote_bgp_peering_address = module.aws_vpn_gateway[0].i0_tunnel1_bgp_peering_address
shared_key = module.aws_vpn_gateway[0].i0_tunnel1_shared_key
custom_bgp_address_primary = local.azure_apipa_instance_0[0] # active on instance 0
custom_bgp_address_secondary = local.azure_apipa_instance_1[0] # required, unused
}
# ... repeat for i0_t2, i1_t1, i1_t2
The Azure gateway’s two public IPs flow into AWS as two Customer Gateways. AWS creates two VPN Connections (4 tunnel endpoints) and outputs their details. Those details flow back into four Azure connections. Everything is a single terraform apply.
Custom BGP Addresses
Each Azure connection specifies custom_bgp_addresses with primary (instance 0’s APIPA) and secondary (instance 1’s APIPA). For connections targeting instance 0, the primary address is the one actually used for BGP peering. For connections targeting instance 1, the secondary is used. The other is required by Azure but not active for that connection.
The Pre-Shared Key Problem
The connection module needs to support two scenarios: generating a key (for office connections where you control both sides) and accepting a key (for AWS connections where AWS generates them). The naive approach with a conditional count on random_password breaks at plan time because the AWS key isn’t known until apply.
The fix is to always generate a key, but only use it as a fallback:
resource "random_password" "shared_key" {
length = 64
special = true
}
locals {
shared_key = var.shared_key != null ? var.shared_key : random_password.shared_key.result
}
AWS connections pass in their key, so the generated one is ignored. Office connections omit the variable, so the generated key is used.
Verifying the Connection
After terraform apply completes (about 50 minutes, mostly the Azure gateway), you can verify from both sides.
AWS side — check tunnel status for both VPN connections:
aws ec2 describe-vpn-connections \
--filters "Name=tag:Name,Values=azure-vpn-i0,azure-vpn-i1" \
--query 'VpnConnections[*].{Name:Tags[?Key==`Name`].Value|[0],Telemetry:VgwTelemetry[*].{IP:OutsideIpAddress,Status:Status,Routes:AcceptedRouteCount}}' \
--output json
All four tunnels should show UP with at least 1 accepted BGP route (the Azure VNet CIDR).
Azure side — check learned routes:
az network vnet-gateway list-learned-routes \
--resource-group <resource-group> \
--name <gateway-name> \
--output table
You should see the AWS VPC CIDR learned via EBgp from four peers (the four tunnel APIPA addresses). You’ll also see IBgp routes between the two Azure instances — this is the two gateway instances sharing routes internally, confirming that active-active is working.
One More Thing: Route Propagation on AWS
Azure automatically makes BGP-learned routes available within the VNet. AWS does not — you need to enable route propagation on each VPC route table where resources need to reach Azure.
Without it, your EC2 instances learn nothing about 10.224.0.0/12 even though the VGW has the route via BGP. Enable propagation in the AWS console under VPC > Route Tables > Route Propagation, or via:
aws ec2 enable-vgw-route-propagation \
--route-table-id rtb-xxxxxxxx \
--gateway-id vgw-xxxxxxxx
This is a per-route-table setting, so if you have separate route tables for public and private subnets, enable it on each one that needs Azure connectivity.
A Note on Teardown
If you need to destroy and recreate the VPN (e.g. switching from active-standby to active-active), be aware of a dependency ordering issue: the AWS Customer Gateway cannot be deleted while a VPN Connection still references it. Terraform may try to delete the CGW first and fail. The workaround is to delete the AWS VPN Connection before the CGW, either by running a targeted destroy or by removing the VPN connection manually first. The Azure gateway itself takes ~10 minutes to tear down.
Summary
Connecting AWS and Azure via a site-to-site VPN is fundamentally the same pattern as connecting on-prem to a cloud. Your have IPsec for encryption, BGP for routing, link-local addresses for peering inside the tunnel. The main complexity comes from the asymmetry: AWS creates two tunnel endpoints per VPN connection while Azure creates one connection per tunnel, and Azure requires active-active mode to support multiple APIPA-based BGP sessions.
With active-active, the setup work doubles: two Azure public IPs, two AWS Customer Gateways, two AWS VPN Connections (4 tunnel endpoints), and four Azure connection resources. But by splitting the Terraform into a gateway module (created once) and a reusable connection module (created per tunnel), all four connections use the same code with different inputs. The AWS module outputs wire directly into the Azure connection modules, so a single terraform apply provisions both sides without any manual value copying.
If you’e like help configuring VPNs between AWS, Azure, or on-prem servers, contact us at Trailhead. We can help you do it with Terraform in a way that is automatic and repeatable.


