ReplicaSets, Deployments, StatefulSets, DaemonSets, Jobs & CronJobs
Use arrow keys or click anywhere to navigate
Sarah's API Pod is running in production. At 3 AM, the node it's running on crashes. The Pod is gone. No one is paged. No replacement is created. Users wake up to a broken service.
"But I thought Kubernetes was supposed to be self-healing?" she asks her lead.
"It is," her lead replies, "but only if you use controllers. A bare Pod is like hiring one employee with no backup plan."
In this module, we will learn about the controllers that keep your applications running, scale them up and down, and update them safely. These are the autopilots that make Kubernetes truly powerful.
Every controller in Kubernetes follows the same simple but powerful pattern: a reconciliation loop.
Think of a thermostat. You set the desired temperature (desired state). The thermostat constantly measures the actual temperature (current state) and turns the heater on or off to match. Kubernetes controllers work exactly the same way -- forever.
Kubernetes has different controllers for different workload types:
| Controller | Purpose | Key Feature |
|---|---|---|
| ReplicaSet | Maintain N identical Pod replicas | Self-healing, scaling |
| Deployment | Manage ReplicaSets with rollout strategy | Rolling updates, rollbacks |
| StatefulSet | Stateful apps with stable identity | Ordered, sticky storage |
| DaemonSet | One Pod per node | Cluster-wide agents |
| Job | Run to completion | Batch processing |
| CronJob | Schedule recurring Jobs | Cron-based scheduling |
Let's explore each one, starting from the foundation and building up.
Sarah never wants to be caught with a single dead Pod again. A ReplicaSet ensures she always has the right number of copies running.
Imagine a restaurant that always needs 3 waiters on the floor. If one calls in sick, the manager immediately calls a replacement. If an extra shows up, they're sent home. A ReplicaSet is that manager for your Pods.
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: api-rs
labels:
app: my-api
spec:
replicas: 3
selector: # How the RS finds its Pods
matchLabels:
app: my-api
template: # Pod template -- creates Pods from this
metadata:
labels:
app: my-api # MUST match selector above
spec:
containers:
- name: api
image: my-api:1.0
ports:
- containerPort: 8080
resources:
requests:
cpu: "250m"
memory: "128Mi"
selector.matchLabels. If they don't, the API server will reject the manifest. This is how the ReplicaSet knows which Pods belong to it.
# Create the ReplicaSet
kubectl apply -f replicaset.yaml
# See 3 Pods running
kubectl get pods
# api-rs-abc12 Running
# api-rs-def34 Running
# api-rs-ghi56 Running
# Delete a Pod manually
kubectl delete pod api-rs-abc12
# ReplicaSet immediately creates a new one!
kubectl get pods
# api-rs-def34 Running
# api-rs-ghi56 Running
# api-rs-xyz99 Running (new!)
# Scale imperatively
kubectl scale rs api-rs --replicas=5
# Or edit the YAML and apply
spec:
replicas: 5
kubectl apply -f replicaset.yaml
Each Pod created by a ReplicaSet has an ownerReferences field pointing back to the RS. Delete the RS, and all its Pods are garbage collected.
# Delete RS but keep Pods
kubectl delete rs api-rs --cascade=orphan
Sarah has self-healing now. But what happens when she needs to update her app to version 2.0?
To update all Pods, Sarah would have to:
That is tedious and error-prone. Fortunately, there is a controller that automates this entire process...
Let's make sure the foundation is solid
selector.matchLabels to find Pods that belong to it. The Pod template labels must match this selector. This loose coupling via labels is fundamental to how Kubernetes objects relate to each other.Sarah is ready to upgrade her API from v1 to v2. She needs zero downtime. Enter the Deployment -- the most commonly used controller in Kubernetes.
A Deployment is like a release manager who carefully rolls out a new version, watches for problems, and can instantly revert to the previous version if something goes wrong. It does this by managing multiple ReplicaSets behind the scenes.
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-api
labels:
app: my-api
spec:
replicas: 3
revisionHistoryLimit: 10 # Keep 10 old ReplicaSets for rollback
selector:
matchLabels:
app: my-api
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 # At most 1 Pod down during update
maxSurge: 1 # At most 1 extra Pod during update
template:
metadata:
labels:
app: my-api
spec:
containers:
- name: api
image: my-api:1.0
ports:
- containerPort: 8080
resources:
requests: { cpu: "250m", memory: "128Mi" }
limits: { cpu: "500m", memory: "256Mi" }
Notice: the structure is nearly identical to a ReplicaSet, with the addition of strategy and revisionHistoryLimit.
Sarah changes her image from my-api:1.0 to my-api:2.0 and applies. Here's what happens behind the scenes:
# Watch the rollout
kubectl rollout status deployment/my-api
# See both ReplicaSets
kubectl get rs
# my-api-5d8c9 3 3 3 (old, scaling down)
# my-api-7f4e2 2 2 2 (new, scaling up)
With 3 replicas, maxUnavailable: 1, maxSurge: 1:
Can be absolute numbers or percentages (e.g., "25%").
A rollout is triggered when you change the Pod template (.spec.template). Changes to replicas, labels on the Deployment itself, etc. do NOT trigger a rollout.
# Change image in YAML file
# image: my-api:2.0
kubectl apply -f deployment.yaml
Best for production -- version controlled.
kubectl set image deployment/my-api \
api=my-api:2.0
Quick for one-off changes.
kubectl annotate deployment/my-api \
kubernetes.io/change-cause="Update to v2.0 with new auth module"
Sarah deployed v2.0 but error rates are spiking. She needs to go back to v1.0 immediately.
# Check rollout history
kubectl rollout history deployment/my-api
# REVISION CHANGE-CAUSE
# 1 Initial deploy v1.0
# 2 Update to v2.0 with new auth module
# Roll back to previous version
kubectl rollout undo deployment/my-api
# Roll back to a specific revision
kubectl rollout undo deployment/my-api --to-revision=1
# Check rollout status
kubectl rollout status deployment/my-api
revisionHistoryLimit (default: 10).
Gradually replaces old Pods with new ones. Zero downtime if configured properly.
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
Kill ALL old Pods first, then create all new Pods. Causes downtime.
strategy:
type: Recreate
Sarah wants to update to v3.0 but test it with a subset of traffic first. She can pause the rollout mid-way.
# Start the rollout
kubectl set image deployment/my-api api=my-api:3.0
# Pause after some new Pods are created
kubectl rollout pause deployment/my-api
# Now both v2.0 and v3.0 Pods are running
# Test, monitor metrics, check error rates...
# Happy with v3.0? Resume the rollout
kubectl rollout resume deployment/my-api
# Not happy? Undo instead
kubectl rollout undo deployment/my-api
# Create deployment (imperative, for quick generation)
kubectl create deployment my-api --image=my-api:1.0 --replicas=3
# Generate YAML without creating
kubectl create deployment my-api --image=my-api:1.0 \
--dry-run=client -o yaml > deployment.yaml
# Scale
kubectl scale deployment my-api --replicas=5
# Autoscale (HPA)
kubectl autoscale deployment my-api --min=3 --max=10 --cpu-percent=70
# View rollout history with details
kubectl rollout history deployment/my-api --revision=2
# Restart all Pods (rolling restart)
kubectl rollout restart deployment/my-api
kubectl rollout restart is the clean way to restart all Pods in a Deployment. It triggers a new rollout by updating an annotation, replacing Pods gradually. No downtime.
Test your understanding of the most important controller
kubectl rollout undo work internally?revisionHistoryLimit matters -- it controls how many old RSes are retained.Sarah needs to run a 3-node PostgreSQL cluster. Each instance needs its own persistent storage and a stable hostname. Deployments can't do this -- every Pod they create is interchangeable. She needs StatefulSets.
If Deployments are like a team of interchangeable workers (any cashier can serve any customer), StatefulSets are like assigned seating at a restaurant -- each seat has a number, its own place setting, and if someone leaves, their exact seat is preserved for the replacement.
web-0, web-1, web-2apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres-headless # Required: headless Service for DNS
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
ports:
- containerPort: 5432
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates: # Each Pod gets its own PVC
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
Requires: A headless Service (clusterIP: None) for stable DNS entries.
Each StatefulSet Pod gets a DNS name that follows a predictable pattern:
# Headless Service (required)
apiVersion: v1
kind: Service
metadata:
name: postgres-headless
spec:
clusterIP: None # Makes it headless
selector:
matchLabels:
app: postgres
ports:
- port: 5432
<pod-name>.<service-name>.<namespace>.svc.cluster.local
postgres-0.postgres-headless.default.svc.cluster.localpostgres-1.postgres-headless.default.svc.cluster.localpostgres-2.postgres-headless.default.svc.cluster.localThese DNS names are stable -- even if a Pod is deleted and recreated, it gets the same name and DNS entry.
Each Pod gets its own PersistentVolumeClaim, named like:
data-postgres-0
data-postgres-1
data-postgres-2
If a Pod is deleted and recreated, it reattaches to the same PVC. Data is preserved.
postgres-2 and postgres-1 (reverse order), but their PVCs (data-postgres-2, data-postgres-1) remain. Scaling back up reattaches them.
| Feature | Deployment | StatefulSet |
|---|---|---|
| Pod names | Random hash (api-7f4e2-abc12) | Ordered index (postgres-0) |
| Storage | Shared or none | Per-Pod PVC (persistent) |
| Scaling order | Parallel (any order) | Sequential (0, 1, 2...) |
| Update order | Any order | Reverse (2, 1, 0) |
| DNS | Via Service (round-robin) | Per-Pod stable DNS |
| Use for | Stateless (web, API) | Stateful (DB, cache, queue) |
Sarah's security team needs a log collector running on every single node in the cluster. Not 3 copies, not 10 -- exactly one per node, including new nodes that join later.
A DaemonSet ensures that a copy of a Pod runs on every (or selected) node. When a new node is added, the DaemonSet automatically deploys a Pod there. When a node is removed, the Pod is garbage collected.
Fluentd, Fluent Bit, Filebeat -- collect logs from every node
Prometheus node-exporter, Datadog agent -- collect metrics from every node
kube-proxy, Calico, Cilium -- cluster networking on every node
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: log-collector
labels:
app: log-collector
spec:
selector:
matchLabels:
app: log-collector
template:
metadata:
labels:
app: log-collector
spec:
tolerations: # Run on ALL nodes, including control plane
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
containers:
- name: fluentd
image: fluentd:v1.16
resources:
requests: { cpu: "100m", memory: "200Mi" }
limits: { cpu: "200m", memory: "400Mi" }
volumeMounts:
- name: varlog
mountPath: /var/log
volumes:
- name: varlog
hostPath:
path: /var/log
Note: No replicas field -- the number of Pods is determined by the number of matching nodes.
You can restrict a DaemonSet to specific nodes using nodeSelector or node affinity:
Without any selector, the DaemonSet runs on every node (except those with taints the Pod doesn't tolerate).
spec:
template:
spec:
nodeSelector:
role: gpu-worker
Only runs on nodes labeled role=gpu-worker.
RollingUpdate with maxUnavailable: 1. You can also use OnDelete -- new template only applies when old Pods are manually deleted.
updateStrategy:
type: RollingUpdate # or OnDelete
rollingUpdate:
maxUnavailable: 1 # max Pods updated at once
Sarah needs to run a database migration that should execute once, succeed, and stop. She doesn't want it running forever like a Deployment.
A Deployment is like a full-time employee -- always working. A Job is like a contractor -- hired for a specific task, does the work, and leaves when it's done.
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration
spec:
backoffLimit: 4 # Retry up to 4 times on failure
activeDeadlineSeconds: 300 # Kill if not done in 5 minutes
template:
spec:
restartPolicy: Never # Must be Never or OnFailure
containers:
- name: migrate
image: my-api:2.0
command: ["python", "manage.py", "migrate"]
restartPolicy: Never or OnFailure. The Always policy (default for Deployments) is not allowed.
Jobs can run in different patterns depending on your needs:
spec:
completions: 1 # default
parallelism: 1 # default
Run one Pod to completion. Most common pattern (migrations, backups).
spec:
completions: 5
parallelism: 2
Run 5 total Pods, 2 at a time. Good for batch work queues.
spec:
parallelism: 3
Run 3 Pods in parallel, each pulling from a shared queue. No fixed completions.
| Field | Description |
|---|---|
completions | Total number of Pods that must succeed |
parallelism | How many Pods run concurrently |
backoffLimit | Number of retries before marking Job as failed (default: 6) |
activeDeadlineSeconds | Maximum time the Job can run before being terminated |
ttlSecondsAfterFinished | Auto-delete the Job N seconds after completion |
Sarah needs a nightly database backup at 2 AM. She could set an alarm and run it manually... or use a CronJob.
apiVersion: batch/v1
kind: CronJob
metadata:
name: nightly-backup
spec:
schedule: "0 2 * * *" # 2:00 AM every day
concurrencyPolicy: Forbid # Don't run if previous is still running
successfulJobsHistoryLimit: 3 # Keep last 3 successful Jobs
failedJobsHistoryLimit: 3 # Keep last 3 failed Jobs
startingDeadlineSeconds: 600 # If missed by 10min, skip this run
jobTemplate:
spec:
backoffLimit: 2
template:
spec:
restartPolicy: OnFailure
containers:
- name: backup
image: postgres:15
command: ["pg_dump", "-h", "postgres-0.postgres-headless", "mydb"]
Cron syntax: minute (0-59), hour (0-23), day of month (1-31), month (1-12), day of week (0-6, Sunday=0)
What happens when a new CronJob trigger fires while the previous Job is still running?
| Policy | Behavior | Use Case |
|---|---|---|
Allow (default) | Multiple Jobs can run concurrently | Independent tasks (sending emails) |
Forbid | Skip the new run if previous is still active | Tasks that can't overlap (DB backups) |
Replace | Cancel the current Job and start a new one | When only the latest run matters |
# List CronJobs and their last schedule
kubectl get cronjobs
# Manually trigger a CronJob
kubectl create job manual-backup --from=cronjob/nightly-backup
# Suspend a CronJob
kubectl patch cronjob nightly-backup -p '{"spec":{"suspend":true}}'
Test your knowledge of specialized controllers
Never or OnFailure. The Always policy would conflict with the Job's purpose of running to completion. With Never, failed Pods are left for debugging and new Pods are created. With OnFailure, the same Pod is restarted in place.concurrencyPolicy: Forbid do on a CronJob?Forbid, if the previous Job from the CronJob is still running when the next schedule fires, that run is simply skipped. This prevents overlapping executions -- critical for tasks like database backups that can't run concurrently.It's Black Friday. Sarah's API is getting 10x normal traffic. She can't manually scale fast enough. She needs autoscaling.
HPA is like a store manager who opens more checkout lanes when the queues get long and closes them when traffic dies down. It watches metrics and adjusts replica count automatically.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-api
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
# Create HPA
kubectl autoscale deployment my-api \
--min=3 --max=20 --cpu-percent=70
# Check HPA status
kubectl get hpa
# NAME REFERENCE TARGETS MIN MAX REPLICAS
# api-hpa Deployment/my-api 45%/70% 3 20 3
# Detailed view
kubectl describe hpa api-hpa
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 60
Scale up aggressively (double every 60s), scale down slowly (1 Pod per minute after 5min cooldown).
replicas in your Deployment manifest when using HPA -- they will fight over the count. Let HPA control the replica count entirely.
Adjusts CPU and memory requests/limits per Pod rather than Pod count.
Best for: right-sizing resources, finding optimal requests/limits.
Scales based on external event sources -- Kafka lag, queue depth, Prometheus queries, etc.
Best for: message consumers, event processors, batch workers.
Deployment strategies and autoscaling
<statefulset-name>-<ordinal>. This provides a stable network identity and ensures that when a Pod is rescheduled, it retains the same name and can reattach to the same PersistentVolumeClaim.This is the decision framework Sarah's team uses when deploying a new workload:
| Question | Answer | Use |
|---|---|---|
| Stateless web app/API? | Yes | Deployment |
| Needs stable storage & identity? | Yes | StatefulSet |
| Run on every node? | Yes | DaemonSet |
| Run to completion, then stop? | Yes | Job |
| Run on a schedule? | Yes | CronJob |
| Just need replicas, no updates? | Rare | ReplicaSet (usually via Deployment) |
Understanding the ownership chain is crucial for debugging:
Controllers use label selectors to find which Pods they should manage. This is how a ReplicaSet knows which Pods count toward its replica count.
selector:
matchLabels:
app: my-api
Each managed Pod has an ownerReferences field pointing to its controller. This enables garbage collection -- delete a Deployment, and its ReplicaSets and Pods are automatically cleaned up.
kubectl get pod my-api-7f4e2 -o yaml
# ownerReferences:
# - apiVersion: apps/v1
# kind: ReplicaSet
# name: my-api-7f4e2
Sarah's first rolling update caused brief 503 errors. Here's how she fixed it.
spec:
minReadySeconds: 30
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
Pod must be Ready for 30 seconds before the rollout continues. Catches pods that pass readiness initially but fail under load.
maxSurge >= 1 (you need somewhere to put the new Pods).
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 0
Updates Pods in reverse ordinal order (2, 1, 0). Each Pod must be Running and Ready before the next is updated.
Partition: Only Pods with ordinal >= partition are updated. Set partition=2 to update only Pod-2, leaving Pod-0 and Pod-1 on the old version. Great for canary testing.
updateStrategy:
type: OnDelete
Pods are NOT automatically updated. When you manually delete a Pod, the new one is created with the updated template.
Use this for databases where you need to control the exact update order and verify data integrity at each step.
Here's how Sarah's production stack uses all the controllers together:
| Component | Controller | Why |
|---|---|---|
| Web frontend | Deployment (5 replicas) + HPA | Stateless, needs auto-scaling |
| REST API | Deployment (3 replicas) + HPA | Stateless, rolling updates |
| PostgreSQL cluster | StatefulSet (3 replicas) | Needs stable identity and persistent storage |
| Redis cache | StatefulSet (3 replicas) | Cluster mode needs stable hostnames |
| Kafka consumers | Deployment + KEDA | Scale based on consumer lag |
| Log collector | DaemonSet | One per node |
| DB backup | CronJob (daily) | Scheduled task |
| DB migration | Job (on deploy) | Run once, then done |
replicas in Deployment YAML when using HPArevisionHistoryLimit -- old ReplicaSets accumulate foreverconcurrencyPolicy -- overlapping runsmaxUnavailable: 0 for zero-downtime updatesminReadySeconds to catch early failuresttlSecondsAfterFinished on Jobs to auto-cleanup# Deployments
kubectl create deployment nginx --image=nginx:1.25 --replicas=3
kubectl create deployment nginx --image=nginx --dry-run=client -o yaml > deploy.yaml
kubectl set image deployment/nginx nginx=nginx:1.26
kubectl scale deployment nginx --replicas=5
kubectl rollout status deployment/nginx
kubectl rollout history deployment/nginx
kubectl rollout undo deployment/nginx
kubectl rollout undo deployment/nginx --to-revision=1
kubectl rollout restart deployment/nginx
# Jobs
kubectl create job my-job --image=busybox -- echo "Hello"
kubectl create job my-job --from=cronjob/my-cronjob
# CronJobs
kubectl create cronjob my-cron --image=busybox \
--schedule="0 2 * * *" -- echo "Nightly task"
# Scaling
kubectl autoscale deployment nginx --min=3 --max=10 --cpu-percent=70
# Debugging
kubectl describe deployment nginx
kubectl get rs -l app=nginx
kubectl rollout history deployment/nginx --revision=2
Sarah now has a resilient, self-healing architecture. Her Pods are managed by Deployments, her database runs in a StatefulSet, and logs are collected by a DaemonSet. But she has new questions...
"My Pods are running, but how do users reach them? How do Pods talk to each other? How do I expose my API to the internet?"
The answer lies in Services and Networking -- the topic of our next module.
Coming up in Module 05:
Comprehensive review of all workload controllers
revisionHistoryLimit matters -- without it, old ReplicaSets accumulate.kubectl rollout history deployment/my-api --revision=2 shows the full Pod template for that specific revision, including the image, environment variables, and any other changes. Without --revision, it shows a summary of all revisions.Next: Module 05 -- Services & Networking
Thank you! Questions?
Workload Controllers
Practice makes perfect. Try deploying each controller type in a lab environment.