Civica Training - Terraform on Azure

AKS with Terraform

From VNet to running Kubernetes cluster

Module 3 of 5 Intermediate

The Journey So Far

"We've built the foundation - resource groups, networks, security groups, Key Vault. Now it's time for the main event: provisioning an Azure Kubernetes Service cluster with Terraform."

AKS Architecture at a Glance

Control Plane (Azure-managed)

  • API Server
  • etcd (cluster state)
  • Scheduler
  • Controller Manager
  • Free - no charge for control plane

Data Plane (You manage)

  • Node pools (VM Scale Sets)
  • Kubelet + kube-proxy on each node
  • Container runtime
  • Your workloads (pods)
  • You pay for node VMs

What We'll Provision

Log Analytics Workspace

Required for AKS monitoring (Container Insights). Create it before the cluster.

resource "azurerm_log_analytics_workspace" "main" { name = "log-aks-platform" location = azurerm_resource_group.aks.location resource_group_name = azurerm_resource_group.aks.name sku = "PerGB2018" retention_in_days = 30 tags = azurerm_resource_group.aks.tags }

azurerm_kubernetes_cluster

Part 1: Core configuration

resource "azurerm_kubernetes_cluster" "main" { name = "aks-platform-uksouth" location = azurerm_resource_group.aks.location resource_group_name = azurerm_resource_group.aks.name dns_prefix = "aks-platform" kubernetes_version = "1.28" sku_tier = "Standard" # "Free" or "Standard" (SLA-backed) tags = azurerm_resource_group.aks.tags

Default Node Pool

Part 2: The required default_node_pool block

default_node_pool { name = "system" vm_size = "Standard_D4s_v5" min_count = 2 max_count = 5 enable_auto_scaling = true os_disk_size_gb = 128 os_disk_type = "Managed" vnet_subnet_id = azurerm_subnet.aks_nodes.id zones = ["1", "2", "3"] only_critical_addons_enabled = true # system pods only node_labels = { "role" = "system" } }

Node Pool VM Sizes

VM SizevCPUMemoryUse Case
Standard_D2s_v528 GBDev/test, small workloads
Standard_D4s_v5416 GBSystem pools, general purpose
Standard_D8s_v5832 GBApplication workloads
Standard_E4s_v5432 GBMemory-intensive (caching, databases)
Standard_F4s_v248 GBCPU-intensive (batch processing)
The default node pool VM size cannot be changed after cluster creation. Choose carefully, or plan to use additional node pools for workloads.

Knowledge Check 1

1. What does sku_tier = "Standard" provide for AKS?

2. What does only_critical_addons_enabled = true do on a node pool?

3. Can you change the default node pool VM size after cluster creation?

AKS Identity Configuration

Part 3: How the cluster authenticates to Azure

System-Assigned (Simple)

identity { type = "SystemAssigned" }
  • Azure creates and manages the identity
  • Lifecycle tied to the cluster
  • Simplest approach

User-Assigned (Recommended)

resource "azurerm_user_assigned_identity" "aks" { name = "id-aks-platform" resource_group_name = azurerm_resource_group.aks.name location = azurerm_resource_group.aks.location } # In the cluster: identity { type = "UserAssigned" identity_ids = [ azurerm_user_assigned_identity.aks.id ] }

Why User-Assigned Identity?

# Grant AKS identity Network Contributor on the VNet subnet resource "azurerm_role_assignment" "aks_network" { scope = azurerm_subnet.aks_nodes.id role_definition_name = "Network Contributor" principal_id = azurerm_user_assigned_identity.aks.principal_id }

AKS Networking: Azure CNI vs kubenet

FeatureAzure CNIkubenet
Pod IP assignmentVNet IPs (routable)Private IPs (NAT to node)
IP consumptionHigh (every pod = 1 VNet IP)Low (only nodes use VNet IPs)
Network policiesAzure + CalicoCalico only
PerformanceBetter (no extra hop)Slightly higher latency
Subnet sizingNeeds large subnetsSmaller subnets OK
Windows nodesSupportedNot supported
Recommended forProduction, advanced networkingDev/test, small clusters

Azure CNI Configuration

# Inside azurerm_kubernetes_cluster network_profile { network_plugin = "azure" network_policy = "calico" load_balancer_sku = "standard" service_cidr = "172.16.0.0/16" dns_service_ip = "172.16.0.10" }

kubenet Configuration

network_profile { network_plugin = "kubenet" network_policy = "calico" load_balancer_sku = "standard" pod_cidr = "10.244.0.0/16" service_cidr = "172.16.0.0/16" dns_service_ip = "172.16.0.10" }

Azure CNI Overlay (Best of Both)

Introduced to solve the IP exhaustion problem of traditional Azure CNI.

network_profile { network_plugin = "azure" network_plugin_mode = "overlay" network_policy = "calico" pod_cidr = "192.168.0.0/16" service_cidr = "172.16.0.0/16" dns_service_ip = "172.16.0.10" }

Knowledge Check 2

1. What is the key difference between Azure CNI and kubenet?

2. The service_cidr in the network profile must:

3. What advantage does Azure CNI Overlay have over traditional Azure CNI?

Azure Monitor / Container Insights

# Inside azurerm_kubernetes_cluster oms_agent { log_analytics_workspace_id = azurerm_log_analytics_workspace.main.id }
// Sample KQL query for container logs ContainerLog | where LogEntry contains "error" | summarize count() by ContainerName | order by count_ desc

Additional Node Pools

Separate system and application workloads

resource "azurerm_kubernetes_cluster_node_pool" "user" { name = "user" kubernetes_cluster_id = azurerm_kubernetes_cluster.main.id vm_size = "Standard_D8s_v5" min_count = 2 max_count = 10 enable_auto_scaling = true os_disk_size_gb = 256 vnet_subnet_id = azurerm_subnet.aks_nodes.id zones = ["1", "2", "3"] mode = "User" node_labels = { "role" = "application" "workload" = "general" } tags = azurerm_resource_group.aks.tags }

System vs User Node Pools

AspectSystem PoolUser Pool
PurposeCoreDNS, konnectivity, metrics-serverApplication workloads
Mode"System""User"
Min nodesAt least 1 (recommended 2+)Can scale to 0
TaintCriticalAddonsOnly (optional)Custom taints
VM sizeSmaller (D4s_v5)Sized for workloads
ScalingConservativeAggressive autoscaling
Every AKS cluster must have at least one System node pool. User pools can scale to zero.

Specialized Node Pools

# Spot instances for batch/non-critical workloads (up to 90% savings) resource "azurerm_kubernetes_cluster_node_pool" "spot" { name = "spot" kubernetes_cluster_id = azurerm_kubernetes_cluster.main.id vm_size = "Standard_D8s_v5" priority = "Spot" eviction_policy = "Delete" spot_max_price = -1 # pay up to on-demand price min_count = 0 max_count = 10 enable_auto_scaling = true mode = "User" node_labels = { "kubernetes.azure.com/scalesetpriority" = "spot" } node_taints = [ "kubernetes.azure.com/scalesetpriority=spot:NoSchedule" ] }

Azure Container Registry

resource "azurerm_container_registry" "main" { name = "craksplatform001" resource_group_name = azurerm_resource_group.aks.name location = azurerm_resource_group.aks.location sku = "Standard" admin_enabled = false # Use managed identity instead tags = azurerm_resource_group.aks.tags }
SKUStorageFeatures
Basic10 GBDevelopment only
Standard100 GBMost production workloads
Premium500 GBGeo-replication, private link, content trust

ACR + AKS Integration

Grant AKS the AcrPull role so it can pull images without imagePullSecrets.

resource "azurerm_role_assignment" "aks_acr" { scope = azurerm_container_registry.main.id role_definition_name = "AcrPull" principal_id = azurerm_kubernetes_cluster.main.kubelet_identity[0].object_id }

Knowledge Check 3

1. How does AKS pull images from ACR without imagePullSecrets?

2. What is the minimum number of nodes a System node pool can have?

3. What does setting spot_max_price = -1 mean for spot node pools?

Complete AKS Cluster Configuration

Putting all the pieces together

resource "azurerm_kubernetes_cluster" "main" { name = "aks-platform-uksouth" location = azurerm_resource_group.aks.location resource_group_name = azurerm_resource_group.aks.name dns_prefix = "aks-platform" kubernetes_version = var.kubernetes_version sku_tier = "Standard" default_node_pool { name = "system" vm_size = "Standard_D4s_v5" min_count = 2 max_count = 5 enable_auto_scaling = true vnet_subnet_id = azurerm_subnet.aks_nodes.id zones = ["1", "2", "3"] only_critical_addons_enabled = true } identity { type = "UserAssigned" identity_ids = [azurerm_user_assigned_identity.aks.id] } network_profile { network_plugin = "azure" network_plugin_mode = "overlay" network_policy = "calico" load_balancer_sku = "standard" service_cidr = "172.16.0.0/16" dns_service_ip = "172.16.0.10" } oms_agent { log_analytics_workspace_id = azurerm_log_analytics_workspace.main.id } tags = azurerm_resource_group.aks.tags }

AKS Auto-Upgrade & Maintenance

# Inside azurerm_kubernetes_cluster automatic_channel_upgrade = "patch" maintenance_window { allowed { day = "Sunday" hours = [0, 1, 2, 3] } }
ChannelBehavior
noneNo auto-upgrade (manual only)
patchAuto-apply patch versions (1.28.x)
stableUpgrade to latest stable minor-1
rapidUpgrade to latest supported version
node-imageAuto-update node OS images only

Azure AD Integration + RBAC

# Inside azurerm_kubernetes_cluster azure_active_directory_role_based_access_control { managed = true azure_rbac_enabled = true admin_group_object_ids = [var.aks_admin_group_id] } role_based_access_control_enabled = true

Kubeconfig Output & Cluster Access

output "kube_config_raw" { value = azurerm_kubernetes_cluster.main.kube_config_raw sensitive = true } output "cluster_fqdn" { value = azurerm_kubernetes_cluster.main.fqdn } output "cluster_id" { value = azurerm_kubernetes_cluster.main.id }
# Access the cluster after apply $ az aks get-credentials \ --resource-group rg-aks-platform-uksouth \ --name aks-platform-uksouth $ kubectl get nodes NAME STATUS ROLES AGE VERSION aks-system-12345678-vmss000000 Ready agent 10m v1.28.3 aks-system-12345678-vmss000001 Ready agent 10m v1.28.3

Useful AKS Outputs

output "kubelet_identity_object_id" { description = "Object ID of kubelet managed identity" value = azurerm_kubernetes_cluster.main.kubelet_identity[0].object_id } output "node_resource_group" { description = "Auto-generated RG containing AKS node resources" value = azurerm_kubernetes_cluster.main.node_resource_group } output "oidc_issuer_url" { description = "OIDC issuer URL for workload identity" value = azurerm_kubernetes_cluster.main.oidc_issuer_url }

AKS Workload Identity

Let pods authenticate to Azure services without secrets.

# Enable on the cluster oidc_issuer_enabled = true workload_identity_enabled = true
# Create a federated credential resource "azurerm_federated_identity_credential" "app" { name = "fed-cred-myapp" resource_group_name = azurerm_resource_group.aks.name parent_id = azurerm_user_assigned_identity.app.id audience = ["api://AzureADTokenExchange"] issuer = azurerm_kubernetes_cluster.main.oidc_issuer_url subject = "system:serviceaccount:myapp:myapp-sa" }

Knowledge Check 4

1. What is the node_resource_group (MC_*) in AKS?

2. What does Workload Identity replace in AKS?

3. Which automatic_channel_upgrade only applies patch versions (e.g., 1.28.3 to 1.28.5)?

AKS Lifecycle Best Practices

resource "azurerm_kubernetes_cluster" "main" { # ... configuration ... lifecycle { ignore_changes = [ default_node_pool[0].node_count, # autoscaler manages this ] prevent_destroy = true # safety for production } } resource "azurerm_kubernetes_cluster_node_pool" "user" { # ... configuration ... lifecycle { ignore_changes = [node_count] # autoscaler manages this } }

Complete AKS Dependency Graph

azurerm_resource_group.aks ├── azurerm_virtual_network.main │ └── azurerm_subnet.aks_nodes │ └── azurerm_role_assignment.aks_network ├── azurerm_user_assigned_identity.aks │ ├── azurerm_role_assignment.aks_network │ └── azurerm_kubernetes_cluster.main │ ├── azurerm_kubernetes_cluster_node_pool.user │ ├── azurerm_kubernetes_cluster_node_pool.spot │ ├── azurerm_role_assignment.aks_acr │ └── azurerm_federated_identity_credential.app ├── azurerm_log_analytics_workspace.main │ └── azurerm_kubernetes_cluster.main └── azurerm_container_registry.main └── azurerm_role_assignment.aks_acr

Module 3 Summary

What We Built

  • Full AKS cluster with Azure CNI Overlay
  • System + User + Spot node pools
  • User-Assigned managed identity
  • ACR integration with AcrPull role
  • Container Insights monitoring
  • Workload Identity for pod auth

Next: Module 4

  • Variables, locals, and outputs in depth
  • Creating reusable modules
  • Remote state with Azure Blob
  • State locking and workspaces
  • Multi-environment management
Module 3 Complete
← Back