Arrow keys or click to navigate
←
→
Civica Training - Terraform on Azure
AKS with Terraform
From VNet to running Kubernetes cluster
Module 3 of 5
Intermediate
The Journey So Far
"We've built the foundation - resource groups, networks, security groups, Key Vault. Now it's time for the main event: provisioning an Azure Kubernetes Service cluster with Terraform."
AKS Architecture at a Glance
Control Plane (Azure-managed)
API Server
etcd (cluster state)
Scheduler
Controller Manager
Free - no charge for control plane
Data Plane (You manage)
Node pools (VM Scale Sets)
Kubelet + kube-proxy on each node
Container runtime
Your workloads (pods)
You pay for node VMs
What We'll Provision
Resource Group - container for all resources
VNet + Subnets - network foundation (from Module 2)
Log Analytics Workspace - monitoring and diagnostics
AKS Cluster - the Kubernetes cluster itself
Default Node Pool - system workloads
User Node Pools - application workloads
Azure Container Registry - store container images
ACR-AKS Integration - pull images without secrets
Log Analytics Workspace
Required for AKS monitoring (Container Insights). Create it before the cluster.
resource "azurerm_log_analytics_workspace" "main" {
name = "log-aks-platform"
location = azurerm_resource_group.aks.location
resource_group_name = azurerm_resource_group.aks.name
sku = "PerGB2018"
retention_in_days = 30
tags = azurerm_resource_group.aks.tags
}
PerGB2018 is the standard pricing tier (pay per GB ingested)
30-day retention is free; longer retention costs extra
azurerm_kubernetes_cluster
Part 1: Core configuration
resource "azurerm_kubernetes_cluster" "main" {
name = "aks-platform-uksouth"
location = azurerm_resource_group.aks.location
resource_group_name = azurerm_resource_group.aks.name
dns_prefix = "aks-platform"
kubernetes_version = "1.28"
sku_tier = "Standard" # "Free" or "Standard" (SLA-backed)
tags = azurerm_resource_group.aks.tags
dns_prefix creates the FQDN: aks-platform-xxxxx.hcp.uksouth.azmk8s.io
sku_tier = "Standard" gives a financially-backed SLA (99.95% for multi-AZ)
kubernetes_version - pin this to control upgrades
Default Node Pool
Part 2: The required default_node_pool block
default_node_pool {
name = "system"
vm_size = "Standard_D4s_v5"
min_count = 2
max_count = 5
enable_auto_scaling = true
os_disk_size_gb = 128
os_disk_type = "Managed"
vnet_subnet_id = azurerm_subnet.aks_nodes.id
zones = ["1" , "2" , "3" ]
only_critical_addons_enabled = true # system pods only
node_labels = {
"role" = "system"
}
}
Node Pool VM Sizes
VM Size vCPU Memory Use Case
Standard_D2s_v5 2 8 GB Dev/test, small workloads
Standard_D4s_v5 4 16 GB System pools, general purpose
Standard_D8s_v5 8 32 GB Application workloads
Standard_E4s_v5 4 32 GB Memory-intensive (caching, databases)
Standard_F4s_v2 4 8 GB CPU-intensive (batch processing)
The default node pool VM size cannot be changed after cluster creation. Choose carefully, or plan to use additional node pools for workloads.
AKS Identity Configuration
Part 3: How the cluster authenticates to Azure
System-Assigned (Simple)
identity {
type = "SystemAssigned"
}
Azure creates and manages the identity
Lifecycle tied to the cluster
Simplest approach
User-Assigned (Recommended)
resource "azurerm_user_assigned_identity" "aks" {
name = "id-aks-platform"
resource_group_name = azurerm_resource_group.aks.name
location = azurerm_resource_group.aks.location
}
# In the cluster:
identity {
type = "UserAssigned"
identity_ids = [
azurerm_user_assigned_identity.aks.id
]
}
Why User-Assigned Identity?
Pre-assign permissions before creating the cluster (e.g., Network Contributor on the subnet)
Survives cluster recreation - role assignments stay intact
Shared across resources - same identity for multiple clusters
Visible in IAM - easier to audit and manage
# Grant AKS identity Network Contributor on the VNet subnet
resource "azurerm_role_assignment" "aks_network" {
scope = azurerm_subnet.aks_nodes.id
role_definition_name = "Network Contributor"
principal_id = azurerm_user_assigned_identity.aks.principal_id
}
AKS Networking: Azure CNI vs kubenet
Feature Azure CNI kubenet
Pod IP assignment VNet IPs (routable) Private IPs (NAT to node)
IP consumption High (every pod = 1 VNet IP) Low (only nodes use VNet IPs)
Network policies Azure + Calico Calico only
Performance Better (no extra hop) Slightly higher latency
Subnet sizing Needs large subnets Smaller subnets OK
Windows nodes Supported Not supported
Recommended for Production, advanced networking Dev/test, small clusters
Azure CNI Configuration
# Inside azurerm_kubernetes_cluster
network_profile {
network_plugin = "azure"
network_policy = "calico"
load_balancer_sku = "standard"
service_cidr = "172.16.0.0/16"
dns_service_ip = "172.16.0.10"
}
network_plugin = "azure" enables Azure CNI
service_cidr - CIDR for Kubernetes Services (must NOT overlap with VNet)
dns_service_ip - must be within service_cidr, typically the .10 address
network_policy = "calico" enables Kubernetes NetworkPolicy enforcement
load_balancer_sku = "standard" required for production (supports AZs)
kubenet Configuration
network_profile {
network_plugin = "kubenet"
network_policy = "calico"
load_balancer_sku = "standard"
pod_cidr = "10.244.0.0/16"
service_cidr = "172.16.0.0/16"
dns_service_ip = "172.16.0.10"
}
pod_cidr - separate CIDR for pod networking (not part of VNet)
Pods communicate via NAT through the node's VNet IP
Fewer VNet IPs consumed, but adds network complexity
Route tables are automatically managed by AKS
Azure CNI Overlay (Best of Both)
Introduced to solve the IP exhaustion problem of traditional Azure CNI.
network_profile {
network_plugin = "azure"
network_plugin_mode = "overlay"
network_policy = "calico"
pod_cidr = "192.168.0.0/16"
service_cidr = "172.16.0.0/16"
dns_service_ip = "172.16.0.10"
}
Pods get IPs from a private CIDR (not VNet IPs)
Only nodes consume VNet IPs - like kubenet
Full Azure CNI feature set including Windows containers
Recommended for new clusters that don't need pods routable on the VNet
Azure Monitor / Container Insights
# Inside azurerm_kubernetes_cluster
oms_agent {
log_analytics_workspace_id = azurerm_log_analytics_workspace.main.id
}
Deploys the Azure Monitor agent as a DaemonSet on each node
Collects container logs, metrics, and Kubernetes events
Powers the Container Insights dashboards in the Azure Portal
Enables log queries via KQL in Log Analytics
// Sample KQL query for container logs
ContainerLog
| where LogEntry contains "error"
| summarize count() by ContainerName
| order by count_ desc
Additional Node Pools
Separate system and application workloads
resource "azurerm_kubernetes_cluster_node_pool" "user" {
name = "user"
kubernetes_cluster_id = azurerm_kubernetes_cluster.main.id
vm_size = "Standard_D8s_v5"
min_count = 2
max_count = 10
enable_auto_scaling = true
os_disk_size_gb = 256
vnet_subnet_id = azurerm_subnet.aks_nodes.id
zones = ["1" , "2" , "3" ]
mode = "User"
node_labels = {
"role" = "application"
"workload" = "general"
}
tags = azurerm_resource_group.aks.tags
}
System vs User Node Pools
Aspect System Pool User Pool
Purpose CoreDNS, konnectivity, metrics-server Application workloads
Mode "System""User"
Min nodes At least 1 (recommended 2+) Can scale to 0
Taint CriticalAddonsOnly (optional)Custom taints
VM size Smaller (D4s_v5) Sized for workloads
Scaling Conservative Aggressive autoscaling
Every AKS cluster must have at least one System node pool. User pools can scale to zero.
Specialized Node Pools
# Spot instances for batch/non-critical workloads (up to 90% savings)
resource "azurerm_kubernetes_cluster_node_pool" "spot" {
name = "spot"
kubernetes_cluster_id = azurerm_kubernetes_cluster.main.id
vm_size = "Standard_D8s_v5"
priority = "Spot"
eviction_policy = "Delete"
spot_max_price = -1 # pay up to on-demand price
min_count = 0
max_count = 10
enable_auto_scaling = true
mode = "User"
node_labels = {
"kubernetes.azure.com/scalesetpriority" = "spot"
}
node_taints = [
"kubernetes.azure.com/scalesetpriority=spot:NoSchedule"
]
}
Azure Container Registry
resource "azurerm_container_registry" "main" {
name = "craksplatform001"
resource_group_name = azurerm_resource_group.aks.name
location = azurerm_resource_group.aks.location
sku = "Standard"
admin_enabled = false # Use managed identity instead
tags = azurerm_resource_group.aks.tags
}
SKU Storage Features
Basic 10 GB Development only
Standard 100 GB Most production workloads
Premium 500 GB Geo-replication, private link, content trust
ACR + AKS Integration
Grant AKS the AcrPull role so it can pull images without imagePullSecrets.
resource "azurerm_role_assignment" "aks_acr" {
scope = azurerm_container_registry.main.id
role_definition_name = "AcrPull"
principal_id = azurerm_kubernetes_cluster.main.kubelet_identity[0].object_id
}
kubelet_identity is the identity the kubelet uses to pull images
AcrPull grants read-only access to the registry
No need for imagePullSecrets in your Kubernetes manifests
Alternative: use az aks update --attach-acr (imperative approach)
Complete AKS Cluster Configuration
Putting all the pieces together
resource "azurerm_kubernetes_cluster" "main" {
name = "aks-platform-uksouth"
location = azurerm_resource_group.aks.location
resource_group_name = azurerm_resource_group.aks.name
dns_prefix = "aks-platform"
kubernetes_version = var.kubernetes_version
sku_tier = "Standard"
default_node_pool {
name = "system"
vm_size = "Standard_D4s_v5"
min_count = 2
max_count = 5
enable_auto_scaling = true
vnet_subnet_id = azurerm_subnet.aks_nodes.id
zones = ["1" , "2" , "3" ]
only_critical_addons_enabled = true
}
identity {
type = "UserAssigned"
identity_ids = [azurerm_user_assigned_identity.aks.id]
}
network_profile {
network_plugin = "azure"
network_plugin_mode = "overlay"
network_policy = "calico"
load_balancer_sku = "standard"
service_cidr = "172.16.0.0/16"
dns_service_ip = "172.16.0.10"
}
oms_agent {
log_analytics_workspace_id = azurerm_log_analytics_workspace.main.id
}
tags = azurerm_resource_group.aks.tags
}
AKS Auto-Upgrade & Maintenance
# Inside azurerm_kubernetes_cluster
automatic_channel_upgrade = "patch"
maintenance_window {
allowed {
day = "Sunday"
hours = [0 , 1 , 2 , 3 ]
}
}
Channel Behavior
noneNo auto-upgrade (manual only)
patchAuto-apply patch versions (1.28.x)
stableUpgrade to latest stable minor-1
rapidUpgrade to latest supported version
node-imageAuto-update node OS images only
Azure AD Integration + RBAC
# Inside azurerm_kubernetes_cluster
azure_active_directory_role_based_access_control {
managed = true
azure_rbac_enabled = true
admin_group_object_ids = [var.aks_admin_group_id ]
}
role_based_access_control_enabled = true
managed = true - Azure manages the AAD integration (no app registrations needed)
azure_rbac_enabled = true - use Azure roles for Kubernetes authorization
admin_group_object_ids - AAD groups with cluster-admin access
Users authenticate via az aks get-credentials using their AAD identity
Kubeconfig Output & Cluster Access
output "kube_config_raw" {
value = azurerm_kubernetes_cluster.main.kube_config_raw
sensitive = true
}
output "cluster_fqdn" {
value = azurerm_kubernetes_cluster.main.fqdn
}
output "cluster_id" {
value = azurerm_kubernetes_cluster.main.id
}
# Access the cluster after apply
$ az aks get-credentials \
--resource-group rg-aks-platform-uksouth \
--name aks-platform-uksouth
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
aks-system-12345678-vmss000000 Ready agent 10m v1.28.3
aks-system-12345678-vmss000001 Ready agent 10m v1.28.3
Useful AKS Outputs
output "kubelet_identity_object_id" {
description = "Object ID of kubelet managed identity"
value = azurerm_kubernetes_cluster.main.kubelet_identity[0].object_id
}
output "node_resource_group" {
description = "Auto-generated RG containing AKS node resources"
value = azurerm_kubernetes_cluster.main.node_resource_group
}
output "oidc_issuer_url" {
description = "OIDC issuer URL for workload identity"
value = azurerm_kubernetes_cluster.main.oidc_issuer_url
}
kubelet_identity - needed for ACR integration and other role assignments
node_resource_group - Azure auto-creates this (MC_*) for node VMs, load balancers, etc.
oidc_issuer_url - required for Workload Identity federation
AKS Workload Identity
Let pods authenticate to Azure services without secrets.
# Enable on the cluster
oidc_issuer_enabled = true
workload_identity_enabled = true
# Create a federated credential
resource "azurerm_federated_identity_credential" "app" {
name = "fed-cred-myapp"
resource_group_name = azurerm_resource_group.aks.name
parent_id = azurerm_user_assigned_identity.app.id
audience = ["api://AzureADTokenExchange" ]
issuer = azurerm_kubernetes_cluster.main.oidc_issuer_url
subject = "system:serviceaccount:myapp:myapp-sa"
}
AKS Lifecycle Best Practices
resource "azurerm_kubernetes_cluster" "main" {
# ... configuration ...
lifecycle {
ignore_changes = [
default_node_pool[0].node_count, # autoscaler manages this
]
prevent_destroy = true # safety for production
}
}
resource "azurerm_kubernetes_cluster_node_pool" "user" {
# ... configuration ...
lifecycle {
ignore_changes = [node_count] # autoscaler manages this
}
}
Always ignore_changes on node_count when autoscaling is enabled
Use prevent_destroy on production clusters
Complete AKS Dependency Graph
azurerm_resource_group.aks
├── azurerm_virtual_network.main
│ └── azurerm_subnet.aks_nodes
│ └── azurerm_role_assignment.aks_network
├── azurerm_user_assigned_identity.aks
│ ├── azurerm_role_assignment.aks_network
│ └── azurerm_kubernetes_cluster.main
│ ├── azurerm_kubernetes_cluster_node_pool.user
│ ├── azurerm_kubernetes_cluster_node_pool.spot
│ ├── azurerm_role_assignment.aks_acr
│ └── azurerm_federated_identity_credential.app
├── azurerm_log_analytics_workspace.main
│ └── azurerm_kubernetes_cluster.main
└── azurerm_container_registry.main
└── azurerm_role_assignment.aks_acr
Module 3 Summary
What We Built
Full AKS cluster with Azure CNI Overlay
System + User + Spot node pools
User-Assigned managed identity
ACR integration with AcrPull role
Container Insights monitoring
Workload Identity for pod auth
Next: Module 4
Variables, locals, and outputs in depth
Creating reusable modules
Remote state with Azure Blob
State locking and workspaces
Multi-environment management
Module 3 Complete
← Back