A Virtual Network NAT (NAT Gateway) is the recommended method for outbound connectivity when using Azure Virtual Networks (vNets). We can also use a NAT Gateway to route egress traffic from Azure Kubernetes Service (AKS), if the AKS cluster is deployed to a subnet associated with a NAT Gateway.

In this article, we’ll look at what NAT Gateway is, which benefits you get from using a NAT Gateway for outbound connectivity, and how we can provision both AKS and NAT Gateway using Terraform.



What is Azure NAT Gateway

Azure NAT Gateway is a fully managed and highly resilient Network Address Translation (NAT) service. Once a NAT Gateway is associated with an Azure Subnet, all resources deployed to that particular subnet will use the NAT Gateway and its associated public IP addresses for outbound connectivity.

We can assign a single NAT Gateway to multiple subnets of a single vNET. However, we can’t use a single NAT Gateway for multiple vNETs. We can assign up to 16 IP addresses to a single NAT Gateway for outbound connectivity in any combination of IP addresses and IP address prefixes. Azure NAT Gateway supports outbound TCP and UDP protocols only.

Benefits of using NAT Gateway

Although we can also use Azure Load Balancer for outbound connectivity, Azure NAT Gateway provides some benefits:

  • Security: Resources (e.g., a Virtual Machine) deployed to subnets with a NAT Gateway associated don’t need an individual public IP address. However, they can still reach external destinations through the NAT Gateway. Destination Firewall rules can be configured and assigned to the IP addresses associated with the NAT Gateway.
  • Resiliency: Azure NAT Gateway is a fully managed service. There is neither a VM nor a physical gateway device. NAT Gateway is a software component that makes it highly resilient.
  • Scalability: NAT Gateway scales automatically according to our needs. We don’t have to configure any scaling rules. A NAT Gateway also has multiple fault domains, which can sustain failures without service outage. When using IP address prefixes, NAT Gateway automatically scales to the number of IP addresses (up to 16) needed for outbound connectivity.
  • Performance: Each NAT Gateway can provide up to 50 Gbps of throughput. 64512 SNAT ports are available for outbound connections for every public IP address, resulting in 50000 concurrent connections per public IP. NAT Gateway is a software-defined service, it won’t affect the network bandwidth of our compute resources deployed to a specific subnet.

Provision AKS and NAT Gateway with Terraform

Now that we know what a NAT Gateway is and which benefits we get from using a NAT Gateway for outbound connectivity, we can move on and provision a new AKS cluster with a NAT Gateway for outbound connectivity. For the sake of this article, we’ll associate an IP address prefix to the NAT Gateway, resulting in our NAT Gateway being able to scale automatically from using one (1) to eight (8) IP addresses for outbound connectivity.

Prerequisites

All you need to follow the samples shown below is:

  • Access to an Azure Subscription: The designated user (I highly recommend using a Service Principal) requires Contributor Role on the subscription to create/mutate/delete resource groups. Additionally, the user (read Service Principal) requires permission on Microsoft.Authorization (read/write/delete) to manage Role Assignments.
  • Terraform CLI is installed on your machine (I’m using Terraform 1.2.8, the most recent version available while writing this article).
  • Azure CLI 2.0 should be installed on your machine. Verify that you’re authenticated and selected the desired Azure subscription.
  • A text editor

Terraform Project Fundamentals

I’ve written about how to set up Terraform projects in many articles on my blog. So, let’s keep it short for this one now. Use the following snippet to create everything we need for this article:

# create and move to the project folder
mkdir aks-with-nat-gateway
cd aks-with-nat-gateway

# create files that we fill with content within the upcoming paragraphs
touch vnet.tf
touch nat_gateway.tf
touch aks.tf

# create the more general files with their corresponding content

# create meta.tf
cat << EOF > meta.tf
terraform {
 required_version = "~> 1.2.8"
 required_providers {
  azurerm = {
   source = "hashicorp/azurerm"
   version = "~> 3.21.0"
  }
 }
}

provider "azurerm" {
 features {}
}
EOF

# create main.tf
cat << EOF > main.tf
resource "azurerm_resource_group" "main" {
 name   = "my-aks-with-nat"
 location = var.location
}
EOF

# create variables.tf
cat << EOF > variables.tf
variable "location" {
 type    = string
 default   = "germanywestcentral"
 description = "The region where the resources will be created."
}

variable "vnet" {
 type = object({
  cird      = string
  sn_cluster_cird = string
 })
 default = {
  cird      = "10.240.0.0/16"
  sn_cluster_cird = "10.240.0.0/22"
 }
 description = "The VNET and subnet configuration."
}
EOF

Specify the vNet infrastructure in Terraform

We’ll keep the vNet infrastructure also fairly simple for this example. Let’s define a vNet with a single subnet. We will assign both AKS and NAT Gateway to that particular subnet.

# vnet.tf
resource "azurerm_virtual_network" "vnet" {
  name        = "vnet-aks-with-nat-gateway"
  address_space    = [var.vnet.cird]
  resource_group_name = azurerm_resource_group.main.name
  location      = azurerm_resource_group.main.location
}

resource "azurerm_subnet" "cluster" {
  name         = "cluster"
  virtual_network_name = azurerm_virtual_network.vnet.name
  resource_group_name = azurerm_resource_group.main.name
  address_prefixes   = [var.vnet.sn_cluster_cird]
}

Create a NAT Gateway with Terraform

Our NAT Gateway should scale from one (1) to eight (8) IP addresses for outbound connectivity. We can achieve this by using an IP address prefix (azurerm_public_ip_prefix) with a prefix_length of 29. In Terraform, we can associate the IP address prefix to the NAT Gateway using the azurerm_nat_gateway_public_ip_prefix_association. Alternatively, you can use azurerm_nat_gateway_public_ip_association when using public IP addresses instead of IP address prefixes.

resource "azurerm_public_ip_prefix" "nat_prefix" {
  name                = "pipp-nat-gateway"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  ip_version          = "IPv4"
  prefix_length       = 29
  sku                 = "Standard"
  zones               = ["1"]
}
resource "azurerm_nat_gateway" "gw_aks" {
  name                    = "natgw-aks"
  resource_group_name     = azurerm_resource_group.main.name
  location                = azurerm_resource_group.main.location
  sku_name                = "Standard"
  idle_timeout_in_minutes = 10
  zones                   = ["1"]
}

resource "azurerm_nat_gateway_public_ip_prefix_association" "nat_ips" {
  nat_gateway_id      = azurerm_nat_gateway.gw_aks.id
  public_ip_prefix_id = azurerm_public_ip_prefix.nat_prefix.id

}

resource "azurerm_subnet_nat_gateway_association" "sn_cluster_nat_gw" {
  subnet_id      = azurerm_subnet.cluster.id
  nat_gateway_id = azurerm_nat_gateway.gw_aks.id
}

output "gateway_ips" {
  value = azurerm_public_ip_prefix.nat_prefix.ip_prefix
}

The output in the snippet ensures that Terraform will print the CIDR reserved for outbound connectivity upon provisioning or mutation of the infrastructure with Terraform.

Create an AKS cluster with NAT Gateway in Terraform

Although we can provision full-fledged AKS clusters in Terraform, we will keep the AKS cluster as simple as possible. We can use the azurerm_kubernetes_service_versions data source to identify the most recent Kubernetes version available in the desired Azure Region. Look at the network_profile. We use outbound_type to specify that we’ll use our “user-assigned” NAT Gateway. This requires the load_balancer_sku to be set to Standard.

data "azurerm_kubernetes_service_versions" "aks_version" {
  location        = azurerm_resource_group.main.location
  include_preview = false
}

resource "azurerm_kubernetes_cluster" "aks" {
  name                = "aks-with-nat-gateway"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  node_resource_group = "${azurerm_resource_group.main.name}-aks"
  sku_tier            = "Free"
  kubernetes_version  = data.azurerm_kubernetes_service_versions.aks_version.latest_version

  dns_prefix = "aks-for-blog"

  default_node_pool {
    name                = "default"
    vm_size             = "Standard_D4s_v4"
    zones               = ["1", "2", "3"]
    enable_auto_scaling = true
    min_count           = 1
    max_count           = 3
    os_disk_type        = "Managed"
    os_disk_size_gb     = 32
    type                = "VirtualMachineScaleSets"
    vnet_subnet_id      = azurerm_subnet.cluster.id
  }

  network_profile {
    network_plugin     = "azure"
    network_policy     = "azure"
    dns_service_ip     = "172.16.0.10"
    docker_bridge_cidr = "172.18.0.1/16"
    service_cidr       = "172.16.0.0/16"
    load_balancer_sku  = "standard"
    outbound_type      = "userAssignedNATGateway"
    nat_gateway_profile {
      idle_timeout_in_minutes = 4
    }
  }

  identity {
    type = "SystemAssigned"
  }

  lifecycle {
    ignore_changes = [
      network_profile[0].nat_gateway_profile
    ]
  }
}

Also, recognize the lifecycle argument. We’ve to explicitly ignore changes being made to network_profile[0].nat_gateway_profile (for now). If we don’t ignore the nat_gateway_profile, Terraform will replace the entire AKS cluster every time we apply the project.

Provisioning the infrastructure with Terraform

Now that we’ve specified all necessary resources in our Terraform project, we can finally move on and deploy the entire infrastructure using terraform apply.Terraform will present a detailed execution plan that outlines what will happen when you confirm the execution plan. Review the plan and confirm to start infrastructure provisioning.

Once provisioning has finished, you should see the IP addresses associated with the NAT Gateway:

gateway_ips = "20.79.170.48/29"

Test outbound connectivity is routed through NAT Gateway

To test outbound connectivity, we can quickly run a small Ubuntu container in the Kubernetes cluster for verification. First, let’s download the necessary credentials to interact with the Kubernetes cluster:

# get credentials for Kubernetes
az aks get-credentials -n aks-with-nat-gateway \
 -g rg-aks-with-nat
# Merged "aks-with-nat-gateway" as current context in /Users/<YOUR_USER_ID>/.kube/config

Let’s now spin up the Ubuntu container to verify outbound connectivity:

kubectl run --rm -it ubuntu \
 -n default \
 --image ubuntu:latest /bin/bash

# Within a few seconds, you should see a Linux prompt
root@ubuntu:/# apt update && apt install curl --yes

# removed logs
# removed logs

root@ubuntu:/# curl http://ipv4.icanhazip.com
# 20.79.170.49 

# exit from the container using [CTRL]+D

As you can see, we requested http://ipv4.icanhazip.com from the IP 20.79.170.49, which is obviously within the range of 20.79.170.48/29.

Destroy the infrastructure

You can quickly destroy the entire infrastructure using terraform destroy -auto-approve.

What we’ve covered in this article

Again, we covered a bunch of stuff in this article, including:

  • 💡 Understand what NAT Gateway is
  • 🙌🏼 Learn the benefits of using a NAT Gateway for outbound connectivity compared to other options
  • 👷 Provisioned AKS and NAT Gateway using Terraform
  • 🔒 Authenticated with an Kubernetes cluster in Azure
  • 🏃🏻‍♂️ Ran an Linux container in AKS straight from the terminal
  • 🔎 Verified outbound connectivity is routed through Azure NAT Gateway

Conclusion

We have a robust and scalable software-defined gateway using Azure NAT Gateway for outbound connectivity. All our workloads deployed to AKS communicate with services not connected via Private Endpoint through the NAT Gateway and use the IP addresses we specified.

Find the code shown in this article on GitHub at https://github.com/ThorstenHans/tf-aks-with-nat-gateway.