Module 04

Workload Controllers

ReplicaSets, Deployments, StatefulSets, DaemonSets, Jobs & CronJobs

CKA CKAD ~47 slides

Use arrow keys or click anywhere to navigate

What happens when a Pod dies?

Sarah's API Pod is running in production. At 3 AM, the node it's running on crashes. The Pod is gone. No one is paged. No replacement is created. Users wake up to a broken service.

"But I thought Kubernetes was supposed to be self-healing?" she asks her lead.

"It is," her lead replies, "but only if you use controllers. A bare Pod is like hiring one employee with no backup plan."

In this module, we will learn about the controllers that keep your applications running, scale them up and down, and update them safely. These are the autopilots that make Kubernetes truly powerful.

The Controller Pattern

Every controller in Kubernetes follows the same simple but powerful pattern: a reconciliation loop.

The autopilot that keeps your apps running

Think of a thermostat. You set the desired temperature (desired state). The thermostat constantly measures the actual temperature (current state) and turns the heater on or off to match. Kubernetes controllers work exactly the same way -- forever.

Observe
current state
-->
Compare
desired vs actual
-->
Act
reconcile the diff
-->
Repeat
continuously
Key insight: You declare WHAT you want (3 replicas of my app). The controller figures out HOW to make it happen and keeps it that way. This is the essence of the declarative model.

The Controller Family

Kubernetes has different controllers for different workload types:

ControllerPurposeKey Feature
ReplicaSetMaintain N identical Pod replicasSelf-healing, scaling
DeploymentManage ReplicaSets with rollout strategyRolling updates, rollbacks
StatefulSetStateful apps with stable identityOrdered, sticky storage
DaemonSetOne Pod per nodeCluster-wide agents
JobRun to completionBatch processing
CronJobSchedule recurring JobsCron-based scheduling

Let's explore each one, starting from the foundation and building up.

ReplicaSets: Your First Safety Net

Sarah never wants to be caught with a single dead Pod again. A ReplicaSet ensures she always has the right number of copies running.

Ensuring you always have the right number of copies

Imagine a restaurant that always needs 3 waiters on the floor. If one calls in sick, the manager immediately calls a replacement. If an extra shows up, they're sent home. A ReplicaSet is that manager for your Pods.

What a ReplicaSet guarantees:

  • A specified number of Pod replicas are running at all times
  • If a Pod fails or is deleted, the ReplicaSet creates a new one
  • If too many Pods exist (e.g., after scaling down), it terminates the excess
  • Uses label selectors to identify which Pods it manages

ReplicaSet Manifest

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: api-rs
  labels:
    app: my-api
spec:
  replicas: 3
  selector:                   # How the RS finds its Pods
    matchLabels:
      app: my-api
  template:                   # Pod template -- creates Pods from this
    metadata:
      labels:
        app: my-api           # MUST match selector above
    spec:
      containers:
      - name: api
        image: my-api:1.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "250m"
            memory: "128Mi"
Critical rule: The Pod template's labels must match the ReplicaSet's selector.matchLabels. If they don't, the API server will reject the manifest. This is how the ReplicaSet knows which Pods belong to it.

How ReplicaSets Work

Self-Healing in Action

# Create the ReplicaSet
kubectl apply -f replicaset.yaml

# See 3 Pods running
kubectl get pods
# api-rs-abc12   Running
# api-rs-def34   Running
# api-rs-ghi56   Running

# Delete a Pod manually
kubectl delete pod api-rs-abc12

# ReplicaSet immediately creates a new one!
kubectl get pods
# api-rs-def34   Running
# api-rs-ghi56   Running
# api-rs-xyz99   Running  (new!)

Scaling

# Scale imperatively
kubectl scale rs api-rs --replicas=5

# Or edit the YAML and apply
spec:
  replicas: 5

kubectl apply -f replicaset.yaml

Ownership

Each Pod created by a ReplicaSet has an ownerReferences field pointing back to the RS. Delete the RS, and all its Pods are garbage collected.

# Delete RS but keep Pods
kubectl delete rs api-rs --cascade=orphan

Why Not Use ReplicaSets Directly?

Sarah has self-healing now. But what happens when she needs to update her app to version 2.0?

The problem: ReplicaSets do NOT support rolling updates. If you change the image in a ReplicaSet's template, existing Pods are not affected. Only newly created Pods (after a scale event or Pod failure) use the new template.

To update all Pods, Sarah would have to:

  1. Create a new ReplicaSet with the new image
  2. Scale up the new ReplicaSet
  3. Scale down the old ReplicaSet
  4. Manage the transition carefully to avoid downtime

That is tedious and error-prone. Fortunately, there is a controller that automates this entire process...

Rule of thumb: Never create ReplicaSets directly. Use Deployments instead -- they manage ReplicaSets for you and add rolling update capability.

Knowledge Check: ReplicaSets

Let's make sure the foundation is solid

Q1: If a ReplicaSet has replicas: 3 and you manually delete one Pod, what happens?

A) The ReplicaSet is marked as degraded
B) The ReplicaSet immediately creates a new Pod to maintain 3 replicas
C) The replicas field is automatically decreased to 2
D) Nothing -- the Pod deletion is blocked
Correct: B) The ReplicaSet controller continuously reconciles the desired state (3 replicas) with the current state. When it detects only 2 running Pods, it immediately creates a replacement to bring the count back to 3.

Q2: What connects a ReplicaSet to its Pods?

A) The Pod name must contain the ReplicaSet name
B) Label selectors -- the RS selector must match the Pod template labels
C) A direct reference in the Pod spec
D) The namespace they share
Correct: B) The ReplicaSet uses selector.matchLabels to find Pods that belong to it. The Pod template labels must match this selector. This loose coupling via labels is fundamental to how Kubernetes objects relate to each other.

Q3: Why should you NOT create ReplicaSets directly in production?

A) They don't support more than 10 replicas
B) They are deprecated
C) They don't support rolling updates -- Deployments manage ReplicaSets and add that capability
D) They can't use resource limits
Correct: C) ReplicaSets maintain Pod count but don't handle updates. When you change the Pod template, existing Pods are unaffected. Deployments wrap ReplicaSets and add rolling update/rollback capability, making them the standard choice for stateless workloads.

Deployments: The Safe Way to Update

Sarah is ready to upgrade her API from v1 to v2. She needs zero downtime. Enter the Deployment -- the most commonly used controller in Kubernetes.

The safe way to update your application (and roll back when things go wrong)

A Deployment is like a release manager who carefully rolls out a new version, watches for problems, and can instantly revert to the previous version if something goes wrong. It does this by managing multiple ReplicaSets behind the scenes.

What Deployments add on top of ReplicaSets:

  • Rolling updates -- gradually replace old Pods with new ones
  • Rollback -- instantly revert to a previous version
  • Pause/Resume -- halt a rollout mid-way for canary testing
  • Revision history -- track every version change

Deployment Manifest

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-api
  labels:
    app: my-api
spec:
  replicas: 3
  revisionHistoryLimit: 10        # Keep 10 old ReplicaSets for rollback
  selector:
    matchLabels:
      app: my-api
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1            # At most 1 Pod down during update
      maxSurge: 1                  # At most 1 extra Pod during update
  template:
    metadata:
      labels:
        app: my-api
    spec:
      containers:
      - name: api
        image: my-api:1.0
        ports:
        - containerPort: 8080
        resources:
          requests: { cpu: "250m", memory: "128Mi" }
          limits:   { cpu: "500m", memory: "256Mi" }

Notice: the structure is nearly identical to a ReplicaSet, with the addition of strategy and revisionHistoryLimit.

Rolling Update: Step by Step

Sarah changes her image from my-api:1.0 to my-api:2.0 and applies. Here's what happens behind the scenes:

Deployment
detects template change
-->
Creates new RS
with v2 template
-->
Scales up new RS
creates v2 Pods
-->
Scales down old RS
removes v1 Pods

During the rollout

# Watch the rollout
kubectl rollout status deployment/my-api

# See both ReplicaSets
kubectl get rs
# my-api-5d8c9   3   3   3   (old, scaling down)
# my-api-7f4e2   2   2   2   (new, scaling up)

maxUnavailable & maxSurge

With 3 replicas, maxUnavailable: 1, maxSurge: 1:

  • Minimum running: 3 - 1 = 2
  • Maximum running: 3 + 1 = 4

Can be absolute numbers or percentages (e.g., "25%").

Triggering a Deployment Update

A rollout is triggered when you change the Pod template (.spec.template). Changes to replicas, labels on the Deployment itself, etc. do NOT trigger a rollout.

Method 1: Edit YAML & Apply

# Change image in YAML file
# image: my-api:2.0

kubectl apply -f deployment.yaml

Best for production -- version controlled.

Method 2: kubectl set image

kubectl set image deployment/my-api \
  api=my-api:2.0

Quick for one-off changes.

Record the change cause for better rollback history:
kubectl annotate deployment/my-api \
  kubernetes.io/change-cause="Update to v2.0 with new auth module"

Rollbacks: When Things Go Wrong

Sarah deployed v2.0 but error rates are spiking. She needs to go back to v1.0 immediately.

# Check rollout history
kubectl rollout history deployment/my-api
# REVISION  CHANGE-CAUSE
# 1         Initial deploy v1.0
# 2         Update to v2.0 with new auth module

# Roll back to previous version
kubectl rollout undo deployment/my-api

# Roll back to a specific revision
kubectl rollout undo deployment/my-api --to-revision=1

# Check rollout status
kubectl rollout status deployment/my-api
What actually happens: Kubernetes doesn't delete old ReplicaSets. It keeps them (scaled to 0) as rollback targets. When you undo, it scales up the old RS and scales down the current one. The number of old RSes kept is controlled by revisionHistoryLimit (default: 10).

Deployment Strategies

RollingUpdate (default)

Gradually replaces old Pods with new ones. Zero downtime if configured properly.

strategy:
  type: RollingUpdate
  rollingUpdate:
    maxUnavailable: 25%
    maxSurge: 25%
  • Best for most workloads
  • Both versions run simultaneously during rollout
  • Your app must handle both versions being active

Recreate

Kill ALL old Pods first, then create all new Pods. Causes downtime.

strategy:
  type: Recreate
  • Use when versions can't coexist (schema changes, shared volume locks)
  • Simple but causes brief downtime
  • Good for dev/test environments
Advanced patterns (not built-in, but common): Blue/Green -- run two full environments, switch traffic at once. Canary -- route a small % of traffic to the new version first. Both can be achieved with Services, Ingress controllers, or service meshes (Istio, Linkerd).

Pause & Resume: Manual Canary Testing

Sarah wants to update to v3.0 but test it with a subset of traffic first. She can pause the rollout mid-way.

# Start the rollout
kubectl set image deployment/my-api api=my-api:3.0

# Pause after some new Pods are created
kubectl rollout pause deployment/my-api

# Now both v2.0 and v3.0 Pods are running
# Test, monitor metrics, check error rates...

# Happy with v3.0? Resume the rollout
kubectl rollout resume deployment/my-api

# Not happy? Undo instead
kubectl rollout undo deployment/my-api
Note: While paused, you can make multiple changes to the Deployment spec (image, resources, env vars) and they will all be applied as a single rollout when you resume.

Essential Deployment Commands

# Create deployment (imperative, for quick generation)
kubectl create deployment my-api --image=my-api:1.0 --replicas=3

# Generate YAML without creating
kubectl create deployment my-api --image=my-api:1.0 \
  --dry-run=client -o yaml > deployment.yaml

# Scale
kubectl scale deployment my-api --replicas=5

# Autoscale (HPA)
kubectl autoscale deployment my-api --min=3 --max=10 --cpu-percent=70

# View rollout history with details
kubectl rollout history deployment/my-api --revision=2

# Restart all Pods (rolling restart)
kubectl rollout restart deployment/my-api
CKA/CKAD Tip: kubectl rollout restart is the clean way to restart all Pods in a Deployment. It triggers a new rollout by updating an annotation, replacing Pods gradually. No downtime.

Knowledge Check: Deployments

Test your understanding of the most important controller

Q1: What object does a Deployment create and manage to maintain Pods?

A) Pods directly
B) ReplicaSets
C) StatefulSets
D) DaemonSets
Correct: B) ReplicaSets. A Deployment manages ReplicaSets, which in turn manage Pods. During a rolling update, the Deployment creates a new ReplicaSet and gradually scales it up while scaling the old one down. The hierarchy is: Deployment -> ReplicaSet -> Pods.

Q2: With replicas=4, maxSurge=1, maxUnavailable=0, how many Pods can exist during a rolling update?

A) Minimum 3, maximum 4
B) Minimum 4, maximum 5
C) Minimum 3, maximum 5
D) Always exactly 4
Correct: B) With maxUnavailable=0, all 4 existing Pods must stay running (minimum 4). With maxSurge=1, at most 1 extra Pod can be created (maximum 5). So K8s first creates 1 new Pod (total 5), then terminates 1 old Pod (back to 4), and repeats.

Q3: How does kubectl rollout undo work internally?

A) It deletes the current Pods and redeploys from a backup
B) It edits the current ReplicaSet's Pod template
C) It scales up the previous ReplicaSet and scales down the current one
D) It restores the YAML file from git
Correct: C) Old ReplicaSets are kept (scaled to 0) as rollback targets. An undo scales up the target revision's ReplicaSet and scales down the current one, following the same rolling update process. This is why revisionHistoryLimit matters -- it controls how many old RSes are retained.

StatefulSets: For Apps That Need Identity

Sarah needs to run a 3-node PostgreSQL cluster. Each instance needs its own persistent storage and a stable hostname. Deployments can't do this -- every Pod they create is interchangeable. She needs StatefulSets.

For apps that need to remember who they are

If Deployments are like a team of interchangeable workers (any cashier can serve any customer), StatefulSets are like assigned seating at a restaurant -- each seat has a number, its own place setting, and if someone leaves, their exact seat is preserved for the replacement.

What makes StatefulSets special:

  • Stable network identity -- Pods get predictable names: web-0, web-1, web-2
  • Stable persistent storage -- each Pod gets its own PVC that persists across rescheduling
  • Ordered deployment & scaling -- Pods are created/deleted in order (0, 1, 2... / ...2, 1, 0)
  • Ordered rolling updates -- updated in reverse order by default

StatefulSet Manifest

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres-headless    # Required: headless Service for DNS
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:15
        ports:
        - containerPort: 5432
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:             # Each Pod gets its own PVC
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi

Requires: A headless Service (clusterIP: None) for stable DNS entries.

Stable Network Identity

Each StatefulSet Pod gets a DNS name that follows a predictable pattern:

# Headless Service (required)
apiVersion: v1
kind: Service
metadata:
  name: postgres-headless
spec:
  clusterIP: None                   # Makes it headless
  selector:
    matchLabels:
      app: postgres
  ports:
  - port: 5432
DNS pattern: <pod-name>.<service-name>.<namespace>.svc.cluster.local
  • postgres-0.postgres-headless.default.svc.cluster.local
  • postgres-1.postgres-headless.default.svc.cluster.local
  • postgres-2.postgres-headless.default.svc.cluster.local

These DNS names are stable -- even if a Pod is deleted and recreated, it gets the same name and DNS entry.

Persistent Storage in StatefulSets

volumeClaimTemplates

Each Pod gets its own PersistentVolumeClaim, named like:
data-postgres-0
data-postgres-1
data-postgres-2

If a Pod is deleted and recreated, it reattaches to the same PVC. Data is preserved.

Key Behaviors

  • PVCs are NOT deleted when you delete the StatefulSet or scale down
  • This prevents accidental data loss
  • You must manually delete PVCs to reclaim storage
  • When scaling back up, the Pod reuses its existing PVC
Important: Scaling down a StatefulSet from 3 to 1 removes Pods postgres-2 and postgres-1 (reverse order), but their PVCs (data-postgres-2, data-postgres-1) remain. Scaling back up reattaches them.

StatefulSet vs Deployment

FeatureDeploymentStatefulSet
Pod namesRandom hash (api-7f4e2-abc12)Ordered index (postgres-0)
StorageShared or nonePer-Pod PVC (persistent)
Scaling orderParallel (any order)Sequential (0, 1, 2...)
Update orderAny orderReverse (2, 1, 0)
DNSVia Service (round-robin)Per-Pod stable DNS
Use forStateless (web, API)Stateful (DB, cache, queue)
When to use StatefulSets: Databases (PostgreSQL, MySQL, MongoDB), message queues (Kafka, RabbitMQ), distributed caches (Redis Cluster), any app where each instance has unique data or needs to be addressed individually.

DaemonSets: One Pod on Every Node

Sarah's security team needs a log collector running on every single node in the cluster. Not 3 copies, not 10 -- exactly one per node, including new nodes that join later.

Like security cameras -- one on every floor

A DaemonSet ensures that a copy of a Pod runs on every (or selected) node. When a new node is added, the DaemonSet automatically deploys a Pod there. When a node is removed, the Pod is garbage collected.

Common DaemonSet workloads:

Logging

Fluentd, Fluent Bit, Filebeat -- collect logs from every node

Monitoring

Prometheus node-exporter, Datadog agent -- collect metrics from every node

Networking

kube-proxy, Calico, Cilium -- cluster networking on every node

DaemonSet Manifest

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: log-collector
  labels:
    app: log-collector
spec:
  selector:
    matchLabels:
      app: log-collector
  template:
    metadata:
      labels:
        app: log-collector
    spec:
      tolerations:                    # Run on ALL nodes, including control plane
      - key: node-role.kubernetes.io/control-plane
        effect: NoSchedule
      containers:
      - name: fluentd
        image: fluentd:v1.16
        resources:
          requests: { cpu: "100m", memory: "200Mi" }
          limits:   { cpu: "200m", memory: "400Mi" }
        volumeMounts:
        - name: varlog
          mountPath: /var/log
      volumes:
      - name: varlog
        hostPath:
          path: /var/log

Note: No replicas field -- the number of Pods is determined by the number of matching nodes.

DaemonSet Node Targeting

You can restrict a DaemonSet to specific nodes using nodeSelector or node affinity:

All Nodes (default)

Without any selector, the DaemonSet runs on every node (except those with taints the Pod doesn't tolerate).

Selected Nodes

spec:
  template:
    spec:
      nodeSelector:
        role: gpu-worker

Only runs on nodes labeled role=gpu-worker.

DaemonSet updates: DaemonSets support rolling updates just like Deployments. The default strategy is RollingUpdate with maxUnavailable: 1. You can also use OnDelete -- new template only applies when old Pods are manually deleted.
updateStrategy:
  type: RollingUpdate     # or OnDelete
  rollingUpdate:
    maxUnavailable: 1     # max Pods updated at once

Jobs: Run Once and Done

Sarah needs to run a database migration that should execute once, succeed, and stop. She doesn't want it running forever like a Deployment.

Run a task to completion, then stop

A Deployment is like a full-time employee -- always working. A Job is like a contractor -- hired for a specific task, does the work, and leaves when it's done.

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
spec:
  backoffLimit: 4             # Retry up to 4 times on failure
  activeDeadlineSeconds: 300  # Kill if not done in 5 minutes
  template:
    spec:
      restartPolicy: Never    # Must be Never or OnFailure
      containers:
      - name: migrate
        image: my-api:2.0
        command: ["python", "manage.py", "migrate"]
Key difference from Deployments: Jobs use restartPolicy: Never or OnFailure. The Always policy (default for Deployments) is not allowed.

Job Patterns: Parallelism & Completions

Jobs can run in different patterns depending on your needs:

Single Job

spec:
  completions: 1     # default
  parallelism: 1     # default

Run one Pod to completion. Most common pattern (migrations, backups).

Fixed Completions

spec:
  completions: 5
  parallelism: 2

Run 5 total Pods, 2 at a time. Good for batch work queues.

Work Queue

spec:
  parallelism: 3

Run 3 Pods in parallel, each pulling from a shared queue. No fixed completions.

FieldDescription
completionsTotal number of Pods that must succeed
parallelismHow many Pods run concurrently
backoffLimitNumber of retries before marking Job as failed (default: 6)
activeDeadlineSecondsMaximum time the Job can run before being terminated
ttlSecondsAfterFinishedAuto-delete the Job N seconds after completion

CronJobs: Scheduled Jobs

Sarah needs a nightly database backup at 2 AM. She could set an alarm and run it manually... or use a CronJob.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-backup
spec:
  schedule: "0 2 * * *"              # 2:00 AM every day
  concurrencyPolicy: Forbid          # Don't run if previous is still running
  successfulJobsHistoryLimit: 3      # Keep last 3 successful Jobs
  failedJobsHistoryLimit: 3          # Keep last 3 failed Jobs
  startingDeadlineSeconds: 600       # If missed by 10min, skip this run
  jobTemplate:
    spec:
      backoffLimit: 2
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: backup
            image: postgres:15
            command: ["pg_dump", "-h", "postgres-0.postgres-headless", "mydb"]

Cron syntax: minute (0-59), hour (0-23), day of month (1-31), month (1-12), day of week (0-6, Sunday=0)

CronJob Concurrency Policies

What happens when a new CronJob trigger fires while the previous Job is still running?

PolicyBehaviorUse Case
Allow (default)Multiple Jobs can run concurrentlyIndependent tasks (sending emails)
ForbidSkip the new run if previous is still activeTasks that can't overlap (DB backups)
ReplaceCancel the current Job and start a new oneWhen only the latest run matters
Gotcha: CronJobs may occasionally create two Jobs for a single schedule slot, or miss a run entirely. Your Jobs should be idempotent (safe to run more than once). If more than 100 missed schedules accumulate, the CronJob stops scheduling and requires manual intervention.

Useful commands:

# List CronJobs and their last schedule
kubectl get cronjobs

# Manually trigger a CronJob
kubectl create job manual-backup --from=cronjob/nightly-backup

# Suspend a CronJob
kubectl patch cronjob nightly-backup -p '{"spec":{"suspend":true}}'

Knowledge Check: DaemonSets, Jobs & CronJobs

Test your knowledge of specialized controllers

Q1: How does a DaemonSet determine how many Pods to run?

A) By the replicas field in the spec
B) Automatically -- one Pod per matching node in the cluster
C) By the number of available CPU cores
D) It always runs exactly 3 Pods
Correct: B) DaemonSets don't have a replicas field. They ensure exactly one Pod runs on every node that matches the DaemonSet's node selector and tolerations. Add a new node to the cluster, and the DaemonSet automatically schedules a Pod on it.

Q2: What restartPolicy values are allowed for a Job's Pod template?

A) Always (default)
B) Never or OnFailure
C) Any value is allowed
D) OnSuccess only
Correct: B) Jobs must use Never or OnFailure. The Always policy would conflict with the Job's purpose of running to completion. With Never, failed Pods are left for debugging and new Pods are created. With OnFailure, the same Pod is restarted in place.

Q3: What does concurrencyPolicy: Forbid do on a CronJob?

A) Prevents the CronJob from being created
B) Skips the new scheduled run if the previous Job is still active
C) Cancels the running Job and starts a new one
D) Prevents all Jobs from running in the namespace
Correct: B) With Forbid, if the previous Job from the CronJob is still running when the next schedule fires, that run is simply skipped. This prevents overlapping executions -- critical for tasks like database backups that can't run concurrently.

Horizontal Pod Autoscaler (HPA)

It's Black Friday. Sarah's API is getting 10x normal traffic. She can't manually scale fast enough. She needs autoscaling.

Automatic scaling based on demand

HPA is like a store manager who opens more checkout lanes when the queues get long and closes them when traffic dies down. It watches metrics and adjusts replica count automatically.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-api
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
Prerequisite: The Metrics Server must be installed in the cluster for HPA to work. Pods must have resource requests defined so HPA can calculate utilization percentages.

HPA in Practice

Quick Setup (imperative)

# Create HPA
kubectl autoscale deployment my-api \
  --min=3 --max=20 --cpu-percent=70

# Check HPA status
kubectl get hpa
# NAME     REFERENCE         TARGETS   MIN  MAX  REPLICAS
# api-hpa  Deployment/my-api 45%/70%   3    20   3

# Detailed view
kubectl describe hpa api-hpa

Scaling Behavior

behavior:
  scaleUp:
    stabilizationWindowSeconds: 60
    policies:
    - type: Percent
      value: 100
      periodSeconds: 60
  scaleDown:
    stabilizationWindowSeconds: 300
    policies:
    - type: Pods
      value: 1
      periodSeconds: 60

Scale up aggressively (double every 60s), scale down slowly (1 Pod per minute after 5min cooldown).

Warning: Do NOT set replicas in your Deployment manifest when using HPA -- they will fight over the count. Let HPA control the replica count entirely.

Beyond HPA: VPA & KEDA

Vertical Pod Autoscaler (VPA)

Adjusts CPU and memory requests/limits per Pod rather than Pod count.

  • Analyzes actual usage over time
  • Recommends or auto-applies right-sized resources
  • Requires Pod restart to apply changes
  • Cannot be used with HPA on the same resource metric

Best for: right-sizing resources, finding optimal requests/limits.

KEDA (Event-Driven Autoscaler)

Scales based on external event sources -- Kafka lag, queue depth, Prometheus queries, etc.

  • Extends HPA with custom scalers
  • Can scale to zero (HPA can't)
  • 50+ built-in scalers
  • Great for event-driven architectures

Best for: message consumers, event processors, batch workers.

Knowledge Check: Strategies & Scaling

Deployment strategies and autoscaling

Q1: When should you use the "Recreate" Deployment strategy instead of "RollingUpdate"?

A) When you need zero downtime
B) When the old and new versions cannot coexist (e.g., incompatible database schemas)
C) When you have more than 10 replicas
D) When using StatefulSets
Correct: B) Recreate terminates all old Pods before creating new ones, causing downtime but ensuring versions never coexist. This is necessary when two versions can't run simultaneously -- for example, if they use incompatible database schemas or compete for a shared resource lock.

Q2: What is required in the cluster for HPA to function?

A) VPA must also be installed
B) Metrics Server must be installed and Pods must have resource requests defined
C) Only the Deployment must have a replicas field set
D) KEDA must be installed as a backend
Correct: B) HPA needs the Metrics Server to read CPU/memory utilization data. Additionally, Pods must have resource requests defined so HPA can calculate utilization as a percentage. Without requests, there's no baseline to measure against.

Q3: What makes StatefulSet Pod names special compared to Deployment Pod names?

A) They include the namespace in the name
B) They are randomly generated but longer
C) They have a stable, predictable ordinal index (e.g., web-0, web-1, web-2)
D) They are always prefixed with "sts-"
Correct: C) StatefulSet Pods get predictable, stable names with an ordinal index: <statefulset-name>-<ordinal>. This provides a stable network identity and ensures that when a Pod is rescheduled, it retains the same name and can reattach to the same PersistentVolumeClaim.

Choosing the Right Controller

This is the decision framework Sarah's team uses when deploying a new workload:

QuestionAnswerUse
Stateless web app/API?YesDeployment
Needs stable storage & identity?YesStatefulSet
Run on every node?YesDaemonSet
Run to completion, then stop?YesJob
Run on a schedule?YesCronJob
Just need replicas, no updates?RareReplicaSet (usually via Deployment)
90% of the time, you will use a Deployment. The other controllers are for specialized use cases. When in doubt, start with a Deployment.

How Controllers Find Their Pods

Understanding the ownership chain is crucial for debugging:

Deployment
-->
ReplicaSet
-->
Pod

Label Selectors (Discovery)

Controllers use label selectors to find which Pods they should manage. This is how a ReplicaSet knows which Pods count toward its replica count.

selector:
  matchLabels:
    app: my-api

ownerReferences (Ownership)

Each managed Pod has an ownerReferences field pointing to its controller. This enables garbage collection -- delete a Deployment, and its ReplicaSets and Pods are automatically cleaned up.

kubectl get pod my-api-7f4e2 -o yaml
# ownerReferences:
# - apiVersion: apps/v1
#   kind: ReplicaSet
#   name: my-api-7f4e2

Rolling Update: Tuning for Zero Downtime

Sarah's first rolling update caused brief 503 errors. Here's how she fixed it.

The Checklist

  1. Readiness probes -- new Pods must pass before receiving traffic
  2. Graceful shutdown -- preStop hook + SIGTERM handling
  3. minReadySeconds -- wait N seconds after ready before proceeding
  4. PodDisruptionBudget -- guarantee minimum available Pods
  5. Proper maxUnavailable/maxSurge -- balance speed vs. safety

minReadySeconds

spec:
  minReadySeconds: 30
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1

Pod must be Ready for 30 seconds before the rollout continues. Catches pods that pass readiness initially but fail under load.

maxUnavailable: 0 is the safest setting -- it guarantees all desired replicas are available at all times. But it's slower and requires maxSurge >= 1 (you need somewhere to put the new Pods).

StatefulSet Update Strategies

RollingUpdate (default)

updateStrategy:
  type: RollingUpdate
  rollingUpdate:
    partition: 0

Updates Pods in reverse ordinal order (2, 1, 0). Each Pod must be Running and Ready before the next is updated.

Partition: Only Pods with ordinal >= partition are updated. Set partition=2 to update only Pod-2, leaving Pod-0 and Pod-1 on the old version. Great for canary testing.

OnDelete

updateStrategy:
  type: OnDelete

Pods are NOT automatically updated. When you manually delete a Pod, the new one is created with the updated template.

Use this for databases where you need to control the exact update order and verify data integrity at each step.

StatefulSet gotcha: Unlike Deployments, StatefulSets do NOT create new ReplicaSets during updates. They update Pods in-place (delete and recreate). There is no built-in rollback command -- you must manually revert the Pod template and let the update process handle it.

Real-World Architecture Example

Here's how Sarah's production stack uses all the controllers together:

ComponentControllerWhy
Web frontendDeployment (5 replicas) + HPAStateless, needs auto-scaling
REST APIDeployment (3 replicas) + HPAStateless, rolling updates
PostgreSQL clusterStatefulSet (3 replicas)Needs stable identity and persistent storage
Redis cacheStatefulSet (3 replicas)Cluster mode needs stable hostnames
Kafka consumersDeployment + KEDAScale based on consumer lag
Log collectorDaemonSetOne per node
DB backupCronJob (daily)Scheduled task
DB migrationJob (on deploy)Run once, then done

Common Mistakes & How to Avoid Them

Mistakes

  • Creating bare Pods in production (no self-healing)
  • Using ReplicaSets directly instead of Deployments
  • No readiness probe -- rolling update sends traffic to unready Pods
  • Setting replicas in Deployment YAML when using HPA
  • Using Deployments for stateful workloads (data loss on rescheduling)
  • Forgetting revisionHistoryLimit -- old ReplicaSets accumulate forever
  • CronJobs without concurrencyPolicy -- overlapping runs

Best Practices

  • Always use Deployments for stateless workloads
  • Always set readiness probes
  • Use maxUnavailable: 0 for zero-downtime updates
  • Set minReadySeconds to catch early failures
  • Use PodDisruptionBudgets for critical services
  • Make Jobs and CronJobs idempotent
  • Set ttlSecondsAfterFinished on Jobs to auto-cleanup

Exam Quick Reference: Imperative Commands

# Deployments
kubectl create deployment nginx --image=nginx:1.25 --replicas=3
kubectl create deployment nginx --image=nginx --dry-run=client -o yaml > deploy.yaml
kubectl set image deployment/nginx nginx=nginx:1.26
kubectl scale deployment nginx --replicas=5
kubectl rollout status deployment/nginx
kubectl rollout history deployment/nginx
kubectl rollout undo deployment/nginx
kubectl rollout undo deployment/nginx --to-revision=1
kubectl rollout restart deployment/nginx

# Jobs
kubectl create job my-job --image=busybox -- echo "Hello"
kubectl create job my-job --from=cronjob/my-cronjob

# CronJobs
kubectl create cronjob my-cron --image=busybox \
  --schedule="0 2 * * *" -- echo "Nightly task"

# Scaling
kubectl autoscale deployment nginx --min=3 --max=10 --cpu-percent=70

# Debugging
kubectl describe deployment nginx
kubectl get rs -l app=nginx
kubectl rollout history deployment/nginx --revision=2

Sarah's Story Continues...

Sarah now has a resilient, self-healing architecture. Her Pods are managed by Deployments, her database runs in a StatefulSet, and logs are collected by a DaemonSet. But she has new questions...

What's next?

"My Pods are running, but how do users reach them? How do Pods talk to each other? How do I expose my API to the internet?"

The answer lies in Services and Networking -- the topic of our next module.

Coming up in Module 05:

  • ClusterIP, NodePort, LoadBalancer Services
  • Ingress Controllers and routing
  • DNS and service discovery
  • Network Policies

Final Knowledge Check

Comprehensive review of all workload controllers

Q1: A Deployment has revisionHistoryLimit: 5. After 8 rolling updates, how many ReplicaSets exist?

A) 8 (one per revision)
B) 6 (1 active + 5 old kept for rollback)
C) 5 (only the limit count)
D) 1 (only the current one)
Correct: B) The current active ReplicaSet plus 5 old ones (scaled to 0) for rollback history. The 3 oldest ReplicaSets beyond the limit are garbage collected. This is why revisionHistoryLimit matters -- without it, old ReplicaSets accumulate.

Q2: You need to run a PostgreSQL cluster with 3 replicas where each instance has its own 50Gi volume. Which controller and why?

A) Deployment with PVC -- it handles scaling and updates
B) StatefulSet -- it provides stable identity, ordered operations, and per-Pod persistent storage
C) DaemonSet -- it ensures one on each node
D) ReplicaSet with volumeClaimTemplates
Correct: B) StatefulSet is designed exactly for this. Each Pod gets a stable hostname (postgres-0, postgres-1, postgres-2), its own PVC via volumeClaimTemplates, and ordered scaling/updates. Deployments share PVCs and have random Pod names, making them unsuitable for databases. ReplicaSets don't support volumeClaimTemplates.

Q3: What command shows the detailed history of a specific Deployment revision?

A) kubectl describe deployment my-api
B) kubectl get rs --show-labels
C) kubectl rollout history deployment/my-api --revision=2
D) kubectl logs deployment/my-api
Correct: C) kubectl rollout history deployment/my-api --revision=2 shows the full Pod template for that specific revision, including the image, environment variables, and any other changes. Without --revision, it shows a summary of all revisions.
Module Complete

Key Takeaways

  • Controllers implement the reconciliation loop: observe, compare, act, repeat
  • Deployments are the go-to for stateless workloads -- rolling updates, rollbacks, scaling
  • ReplicaSets maintain Pod count but are managed by Deployments (don't create directly)
  • StatefulSets provide stable identity, ordered operations, and per-Pod persistent storage
  • DaemonSets run one Pod per node -- essential for cluster-wide agents
  • Jobs run to completion; CronJobs schedule them on a cron basis
  • HPA autoscales replicas based on metrics; requires Metrics Server and resource requests
  • Rolling updates need readiness probes, graceful shutdown, and proper maxUnavailable/maxSurge
  • Always make Jobs idempotent -- they may run more than once

Next: Module 05 -- Services & Networking

End of Module 04

Workload Controllers

Practice makes perfect. Try deploying each controller type in a lab environment.

← Back