Cluster Architecture, Installation & Configuration
Understanding every component, from control plane to network fabric
Presentation 2 of 8 | CKA Domain: 25% of Exam
Understanding every component, from control plane to network fabric
Presentation 2 of 8 | CKA Domain: 25% of Exam
Your team lead walks over to your desk: "We need a production Kubernetes cluster for the new payments platform. Three environments — dev, staging, production. It needs RBAC, proper networking, and we need to be able to upgrade it without downtime. The deadline is next month."
You nod confidently. But inside, questions are racing: How does the control plane actually work? What happens if etcd goes down? How do pods on different nodes talk to each other? How do you control who can do what?
This presentation answers every one of those questions. By the end, you'll understand the architecture deeply enough to build, secure, and maintain a production cluster.
We'll explore the cluster layer by layer, from the control plane brain to the network fabric that connects everything.
To understand cluster architecture, imagine a well-run city. Each component has a role, and the city only works when they all collaborate.
API Server is reception — every request goes through it. etcd is the filing room — every record is stored here. Scheduler is the traffic controller — directs new workloads to the right district. Controllers are building inspectors — constantly checking that reality matches the blueprints.
kubelet is the site manager — takes orders from City Hall and ensures buildings (pods) are constructed correctly. kube-proxy is the postal service — ensures messages (network traffic) reach the right address. Container Runtime is the factory floor where actual construction happens.
Every single interaction with the cluster — from kubectl to the kubelet — flows through the API Server. Let's trace a request.
When you run kubectl apply -f deployment.yaml, here's what happens behind the scenes:
# Check API Server health kubectl get --raw /healthz kubectl get --raw /readyz
If the API Server is the front door, etcd is the vault behind it. Lose etcd, lose everything.
Imagine your company's filing cabinet — every employee record, every contract, every policy document. Now imagine that filing cabinet catches fire and there's no backup. That's what losing etcd without a backup feels like. Every pod definition, every secret, every config map — gone.
# Snapshot backup ETCDCTL_API=3 etcdctl snapshot save \ /tmp/etcd-backup.db \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key # Verify the snapshot ETCDCTL_API=3 etcdctl snapshot status \ /tmp/etcd-backup.db --write-out=table
Let's test our understanding of the control plane before moving on to the worker side.
Now that we understand API Server and etcd, let's meet the matchmaker that pairs pods with nodes.
Think of the Scheduler as a wedding planner for pods and nodes. A new pod arrives saying "I need 2 CPUs, 4GB RAM, and I'd prefer to be in zone-a." The Scheduler scans all available nodes, scores them on compatibility, and makes the match. No pod gets placed without the Scheduler's approval.
# nodeSelector — simple label matching
spec:
nodeSelector:
disktype: ssd
# nodeAffinity — expressive rules
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- uksouth-1
# Tolerations — allow scheduling on tainted nodes
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
The Scheduler places pods. But who ensures the cluster stays in its desired state? Meet the Controller Manager.
Imagine a building inspector who never sleeps. Every second, they walk through the city checking: "Are there really 3 replicas of the payment service running? Is that node still healthy? Does this service still have endpoints?" If anything is out of alignment, they immediately take corrective action.
This pattern — observe, compare, act — is the heart of Kubernetes. Every controller follows it.
Running Kubernetes on Azure (or any cloud) requires a bridge between K8s abstractions and cloud-specific resources. That's the Cloud Controller Manager.
When you create a Kubernetes Service of type LoadBalancer, K8s doesn't know how to create an Azure Load Balancer. The Cloud Controller Manager does. It translates K8s intentions into cloud API calls.
In AKS, the Cloud Controller Manager runs as part of the managed control plane — you don't manage it directly, but understanding it helps you debug networking and storage issues.
We've covered the brain. Now let's look at the hands and feet — starting with the kubelet, which runs on every worker node.
The kubelet is like a site manager on a construction project. City Hall (control plane) sends down blueprints (PodSpecs), and the kubelet makes sure the buildings (containers) get built, stay standing, and reports any problems back to HQ.
/var/lib/kubelet/config.yaml/var/lib/kubelet/pki/# Check kubelet status systemctl status kubelet # View kubelet logs journalctl -u kubelet -f
Pods need to talk to each other and to the outside world. kube-proxy makes that possible.
When you create a Kubernetes Service, you get a stable virtual IP (ClusterIP). But pods behind that service come and go. How does traffic find the right pod? kube-proxy programs network rules on each node that redirect traffic from the Service IP to one of the healthy pods behind it.
The kubelet knows what containers to run. The container runtime actually runs them.
Kubernetes uses the Container Runtime Interface (CRI) — a plugin API that lets kubelet work with any compliant runtime. This decoupling was a game-changer.
| Runtime | Notes |
|---|---|
| containerd | Default in K8s 1.24+, used by AKS, GKE. Lightweight, production-proven. |
| CRI-O | Built specifically for K8s by Red Hat. Used by OpenShift. |
| Docker (dockershim) | Removed in K8s 1.24. Docker itself now uses containerd under the hood. |
A common misconception: "Kubernetes removed Docker support!" Not exactly. Docker images still work fine. What was removed was the dockershim — a compatibility layer. Docker itself uses containerd internally, so K8s now talks to containerd directly, cutting out the middleman.
Simpler, faster, and fewer moving parts. Your Docker images are 100% compatible.
We've covered both sides of the cluster — control plane and worker nodes. Time for a check-in.
Now that we know every component, let's see how they all come together during cluster creation.
kubeadm is the official tool for creating a K8s cluster from scratch. Think of it as the city planner who lays out roads, builds City Hall, and prepares the districts before anyone moves in.
# On the control plane node: sudo kubeadm init \ --pod-network-cidr=10.244.0.0/16 \ --apiserver-advertise-address=192.168.1.10 # Set up kubeconfig mkdir -p $HOME/.kube sudo cp /etc/kubernetes/admin.conf \ $HOME/.kube/config # Install a CNI plugin (e.g., Calico) kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml # On each worker node: sudo kubeadm join 192.168.1.10:6443 \ --token abc123.xyz789 \ --discovery-token-ca-cert-hash \ sha256:<hash>
CKA Exam: You may be asked to bootstrap a cluster or add nodes. Know these commands.
With components in place, let's tackle one of the trickiest topics in K8s — networking. How does a pod on Node A talk to a pod on Node B?
Kubernetes makes a bold promise: every pod gets its own IP address, and every pod can communicate with every other pod without NAT. This is the "flat network" model. No matter which node a pod is on, it can reach any other pod by IP.
CoreDNS runs as a cluster addon and provides DNS resolution:
# Service DNS format: <service>.<namespace>.svc.cluster.local # Examples: payment-api.production.svc.cluster.local redis.cache.svc.cluster.local # Pod DNS (if enabled): 10-244-1-5.default.pod.cluster.local
Pods in the same namespace can use short names: just payment-api instead of the full FQDN.
K8s defines the networking rules, but a CNI plugin actually implements them. Think of it as the difference between traffic laws and the actual roads.
| Plugin | Approach |
|---|---|
| Azure CNI | Assigns Azure VNet IPs directly to pods. First-class Azure citizen. |
| Calico | BGP-based routing + network policies. Very popular for on-prem. |
| Flannel | Simple VXLAN overlay. Easy to set up, limited features. |
| Cilium | eBPF-based. High performance, advanced observability. |
| Weave Net | Mesh overlay. Easy setup, encrypts by default. |
With Azure CNI, pods get real Azure VNet IP addresses. This means pods can communicate directly with other Azure resources (VMs, databases) without any translation. The pod is a first-class network citizen in your Azure VNet.
A production cluster without RBAC is like an office building where every employee has the master key. Let's fix that.
Imagine three teams sharing a cluster: Payments, Inventory, and Analytics. Without RBAC, a junior developer on the Analytics team could accidentally kubectl delete namespace payments and take down the entire payment system. RBAC prevents this by defining exactly who can do what, and where.
The pattern: Define WHAT actions are allowed (Role), then WHO gets those permissions (Binding).
| Element | Description |
|---|---|
| Subjects | Users, Groups, ServiceAccounts |
| Verbs | get, list, watch, create, update, patch, delete |
| Resources | pods, services, deployments, secrets, etc. |
| Namespaces | Scope of the permission |
RBAC is additive only — there are no "deny" rules. If no rule grants permission, it's denied by default.
Let's make RBAC concrete with real YAML that you'll write in production and on the CKA exam.
# Role: can read pods in "dev" namespace apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: dev name: pod-reader rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "watch", "list"] --- # Bind the role to user "jane" apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: read-pods-binding namespace: dev subjects: - kind: User name: jane apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: pod-reader apiGroup: rbac.authorization.k8s.io
# ClusterRole: can read nodes (cluster-wide) apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: node-reader rules: - apiGroups: [""] resources: ["nodes"] verbs: ["get", "watch", "list"] --- # Bind to group "ops-team" cluster-wide apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: read-nodes-global subjects: - kind: Group name: ops-team apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: node-reader apiGroup: rbac.authorization.k8s.io
kubectl auth can-i list pods --namespace dev --as jane # yes kubectl auth can-i delete pods --namespace dev --as jane # no
RBAC controls who can do what. But pods need identities too — that's where Service Accounts come in.
Think about it: when your application pod needs to list pods in its namespace (for service discovery) or read a secret, it needs to authenticate to the API Server. But pods aren't humans — they can't log in. Service Accounts give pods an identity that RBAC can authorize.
default service accountdefault SA unless you specify otherwise
# Create a service account
kubectl create serviceaccount app-sa -n dev
# Use it in a pod
apiVersion: v1
kind: Pod
metadata:
name: my-app
namespace: dev
spec:
serviceAccountName: app-sa
automountServiceAccountToken: true
containers:
- name: my-app
image: my-app:1.0
# Bind a role to the service account kubectl create rolebinding app-sa-binding \ --role=pod-reader \ --serviceaccount=dev:app-sa \ -n dev
We've covered networking and security. These are crucial for the CKA exam — let's make sure they're solid.
You've set up RBAC in the cluster. But how does kubectl know which cluster to talk to and how to authenticate? That's the kubeconfig file.
# Kubeconfig structure
apiVersion: v1
kind: Config
clusters:
- cluster:
server: https://myaks.uksouth.azmk8s.io:443
certificate-authority-data: LS0t...
name: myAKSCluster
users:
- name: clusterAdmin
user:
client-certificate-data: LS0t...
client-key-data: LS0t...
contexts:
- context:
cluster: myAKSCluster
user: clusterAdmin
namespace: production
name: aks-prod
current-context: aks-prod
# View current config kubectl config view # List contexts kubectl config get-contexts # Switch context kubectl config use-context aks-prod # Set default namespace for context kubectl config set-context --current \ --namespace=production # Merge kubeconfig files export KUBECONFIG=~/.kube/config:~/.kube/aks-config kubectl config view --flatten > merged-config
Default location: ~/.kube/config. Override with KUBECONFIG env var or --kubeconfig flag.
Kubernetes releases a new minor version roughly every 4 months. Upgrading a production cluster without downtime requires a methodical approach.
Think of upgrading a cluster like renovating a hotel while guests are still staying in it. You can't shut everything down at once. You upgrade one floor at a time, moving guests around as needed. The control plane goes first, then worker nodes one by one.
# On control plane: sudo apt-get update sudo apt-get install -y kubeadm=1.29.0-* sudo kubeadm upgrade plan sudo kubeadm upgrade apply v1.29.0 # Upgrade kubelet on control plane: sudo apt-get install -y \ kubelet=1.29.0-* kubectl=1.29.0-* sudo systemctl daemon-reload sudo systemctl restart kubelet # On each worker node: kubectl drain node-1 --ignore-daemonsets # (SSH to node-1, upgrade kubeadm, kubelet) sudo kubeadm upgrade node kubectl uncordon node-1
Rule: You can only upgrade one minor version at a time (1.28 -> 1.29, not 1.27 -> 1.29).
Now that we understand the manual upgrade process, let's see how AKS simplifies it dramatically.
With AKS, Microsoft handles the control plane upgrade for you. For worker nodes, AKS uses a surge upgrade strategy: it adds extra nodes to the pool, cordons and drains old nodes, and rolls forward. You get the upgrade with minimal disruption and much less manual work.
# Check available versions az aks get-upgrades \ --resource-group myRG \ --name myCluster \ --output table # Upgrade cluster az aks upgrade \ --resource-group myRG \ --name myCluster \ --kubernetes-version 1.29.0 # Enable auto-upgrade az aks update \ --resource-group myRG \ --name myCluster \ --auto-upgrade-channel stable # Upgrade just node images (OS patches) az aks nodepool upgrade \ --resource-group myRG \ --cluster-name myCluster \ --name nodepool1 \ --node-image-only
Remember our city analogy? etcd is the filing room. Let's talk about fire drills and disaster recovery.
It's 2 AM. An admin accidentally deletes the production namespace with all its deployments, services, and secrets. Without an etcd backup, the only recovery is to recreate everything from scratch — if you even remember what was running. With a recent backup, you restore the cluster to its pre-disaster state in minutes.
# Create a snapshot ETCDCTL_API=3 etcdctl snapshot save \ /backup/etcd-$(date +%Y%m%d).db \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key # Verify the snapshot ETCDCTL_API=3 etcdctl snapshot status \ /backup/etcd-20240115.db \ --write-out=table
Schedule regular backups with a cron job. In AKS, etcd backups are managed by Azure.
# Stop the API Server (if using static pods) # Move the manifest temporarily sudo mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/ # Restore the snapshot ETCDCTL_API=3 etcdctl snapshot restore \ /backup/etcd-20240115.db \ --data-dir=/var/lib/etcd-restored # Update etcd config to use new data dir # Edit: /etc/kubernetes/manifests/etcd.yaml # Change --data-dir=/var/lib/etcd-restored # Move API Server manifest back sudo mv /tmp/kube-apiserver.yaml \ /etc/kubernetes/manifests/
A single control plane node is a single point of failure. For production, you need high availability.
etcd runs on the same nodes as the control plane components. Simpler to set up, but a node failure loses both a control plane member and an etcd member.
Used by: AKS, most managed services
etcd runs on its own dedicated nodes, separate from the control plane. More resilient (etcd and control plane failures are independent) but more complex and costly.
Used by: Large enterprise, high-compliance environments
AKS handles all of this for you — the managed control plane is always HA with an SLA of 99.95% (with SLA tier).
Sometimes you need to take a node out of service — for upgrades, maintenance, or decommissioning. Here's how to do it gracefully.
Mark a node as unschedulable. Existing pods keep running, but no new pods will be placed here.
kubectl cordon node-2 # Status shows: # SchedulingDisabled
Like putting a "No Vacancy" sign on a hotel floor.
Evict all pods from a node. Pods are rescheduled to other nodes. The node is also cordoned.
kubectl drain node-2 \ --ignore-daemonsets \ --delete-emptydir-data # Pods gracefully terminated # and rescheduled elsewhere
Like evacuating a hotel floor for renovation.
Mark the node as schedulable again. New pods can be placed here once more.
kubectl uncordon node-2 # Status: Ready # New pods can now be # scheduled here
Like reopening the hotel floor after renovation.
CKA Exam: Drain and uncordon is a common task during upgrade questions. Always use --ignore-daemonsets to avoid errors from system pods.
Without resource controls, one runaway pod could consume all the CPU on a node and starve every other workload. Let's prevent that.
spec:
containers:
- name: my-app
resources:
requests:
memory: "256Mi"
cpu: "250m" # 0.25 CPU core
limits:
memory: "512Mi"
cpu: "500m" # 0.5 CPU core
Limits total resources a namespace can consume: max CPU, memory, number of pods, etc.
Sets default requests/limits for pods that don't specify them, and enforces min/max per container.
Final quiz! Let's test our knowledge of day-2 operations — upgrades, backups, and node management.
kubectl drain node-1 do?kubectl drain both cordons the node (marks it unschedulable) AND evicts all pods. The pods are gracefully terminated and the controllers (e.g., ReplicaSet) reschedule them on other available nodes. Use --ignore-daemonsets since DaemonSet pods can't be evicted.Remember the API Server pipeline? Admission Controllers are the last checkpoint before a request is persisted. They can validate or even modify requests.
Checks if a request meets certain criteria and rejects it if not. Example: "Reject pods without resource limits."
Modifies the request before it's persisted. Example: "Automatically inject a sidecar container into every pod."
Mutating runs first, then validating. This ensures modifications are also validated.
Tools like OPA Gatekeeper and Kyverno use webhook-based admission controllers to enforce custom policies.
By default, all pods can talk to all other pods. In production, that's a security risk. Network Policies let you control traffic flow.
Imagine your cluster runs a web frontend, an API, and a database. Should the frontend be able to connect directly to the database? No! Only the API should talk to the database. Network Policies enforce these boundaries.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: db-allow-api-only
namespace: production
spec:
podSelector:
matchLabels:
app: database
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: api
ports:
- protocol: TCP
port: 5432
This policy says: "For pods labeled app=database, only allow incoming TCP traffic on port 5432 from pods labeled app=api." All other incoming traffic to the database is denied.
This domain is worth a quarter of your exam score. Here's what to focus on and how to practice.
kubectl create with --dry-run=client -o yaml to generate YAML quicklyalias k=kubectlkubectl explain <resource> instead of the docs when possibleLet's return to our mission. Your team lead asked you to set up a production cluster. Now you know how.
Next session: Workloads and Scheduling — Deployments, StatefulSets, Jobs, CronJobs, and the art of keeping your applications running exactly how you want them.
We've covered cluster architecture from top to bottom. What would you like to explore further?
Control plane, worker nodes, networking, CNI
RBAC, service accounts, admission controllers, network policies
Upgrades, etcd backup, node management, resource limits
Presentation 2 of 8 | CKA Domain: Cluster Architecture (25%) | Next: Workloads & Scheduling