The building blocks of every Kubernetes application
Use arrow keys or click anywhere to navigate
Sarah is a developer at a fast-growing startup. She has been building containerized apps with Docker for a year. Today, her team is moving to Kubernetes, and her manager just asked her to deploy their flagship API to the cluster.
"Where do I even start?" she wonders, staring at the kubectl command line.
The answer: Pods. Everything in Kubernetes begins with a Pod.
Let's follow Sarah's journey from her first Pod to mastering scheduling, probes, and resource management. By the end, you will be able to do everything she learns -- and pass the CKA/CKAD exam questions about it.
Sarah's Docker experience taught her to think in containers. But Kubernetes wraps containers in something bigger...
Containers in the same Pod share the same network (localhost), storage volumes, and lifecycle. They are always scheduled together on the same node -- just like roommates share an address, a kitchen, and a lease.
Sarah opens the documentation and sees this structure. Let's break it down piece by piece.
Sarah writes her very first Pod manifest:
apiVersion: v1
kind: Pod
metadata:
name: my-api
labels:
app: my-api
tier: backend
spec:
containers:
- name: api
image: nginx:1.25
ports:
- containerPort: 80
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
Every Kubernetes object follows the same pattern: apiVersion, kind, metadata, spec.
Sarah has two ways to create her Pod. Understanding the difference is fundamental to Kubernetes thinking.
Imperative = walking into a kitchen and cooking it yourself, step by step.
Declarative = handing a menu order to the waiter and letting the kitchen figure out how to make it.
kubectl run my-api --image=nginx:1.25
kubectl expose pod my-api --port=80
Good for one-off debugging; not recommended for production.
kubectl apply -f pod.yaml
Version-controlled, repeatable, auditable. Always prefer this.
kubectl run my-api --image=nginx --dry-run=client -o yaml > pod.yaml
Sarah deploys her Pod and watches it go through several phases. Here's what happens behind the scenes...
| Phase | Meaning | What's happening |
|---|---|---|
| Pending | Accepted but not running | Scheduler is finding a node; images are being pulled |
| Running | At least one container running | Main container(s) executing normally |
| Succeeded | All containers exited 0 | Common for Jobs; Pod completed its task |
| Failed | Container exited non-zero | Something went wrong; check logs |
| Unknown | State cannot be determined | Usually a communication failure with the node |
Sarah's teammate asks: "Can I run a log shipper next to my app in the same Pod?" Absolutely -- that is the sidecar pattern.
Extends main container functionality. Examples: log collectors, proxies, sync agents.
The main container and the sidecar share volumes and network.
Proxies network connections from the main container to the outside world.
Example: a proxy that handles connection pooling to a database.
Transforms output from the main container into a standard format.
Example: converting logs to a common format before shipping.
spec:
containers:
- name: app
image: my-app:1.0
- name: log-shipper # sidecar container
image: fluentd:latest
volumeMounts:
- name: logs
mountPath: /var/log/app
volumes:
- name: logs
emptyDir: {}
Before Sarah's API can start, it needs to wait for the database to be ready. Init containers solve this perfectly.
Common use cases:
spec:
initContainers:
- name: wait-for-db
image: busybox:1.36
command: ['sh', '-c',
'until nslookup postgres.default.svc.cluster.local; do
echo "Waiting for DB..."; sleep 2;
done']
containers:
- name: api
image: my-api:1.0
Sarah's cluster is shared by multiple teams. How do they keep their resources from colliding?
Namespaces provide logical isolation within a cluster. Different teams, environments, or projects can each have their own namespace -- with their own resource quotas and access policies.
default -- where resources go if no namespace is specifiedkube-system -- Kubernetes system components (DNS, scheduler, etc.)kube-public -- publicly readable, used for cluster infokube-node-lease -- node heartbeats for health detection# Create a namespace
kubectl create namespace dev
# Deploy to a namespace
kubectl apply -f pod.yaml -n dev
# List pods in a namespace
kubectl get pods -n dev
# List across all namespaces
kubectl get pods -A
Note: Not all resources are namespaced. Nodes, PersistentVolumes, and ClusterRoles are cluster-scoped. Check with kubectl api-resources --namespaced=false.
Sarah has 50 Pods running. How does she find, group, and manage them? Labels are the answer.
Labels are key-value pairs you stick on any resource. Selectors are how you query them. Services use selectors to find Pods. Deployments use selectors to manage ReplicaSets. Everything connects through labels.
metadata:
labels:
app: my-api
tier: backend
env: production
version: v2
# Equality-based
kubectl get pods -l app=my-api
# Set-based
kubectl get pods -l 'env in (prod,staging)'
# Multiple conditions (AND)
kubectl get pods -l app=my-api,tier=backend
app.kubernetes.io/name, app.kubernetes.io/version, app.kubernetes.io/component.
While labels are for identification and selection, annotations store non-identifying metadata.
| Labels | Annotations |
|---|---|
| Used for selection & grouping | Non-identifying metadata |
| Must be short (63 chars max) | Can be large (256KB max) |
| Used by K8s selectors | Used by tools & humans |
metadata:
annotations:
kubernetes.io/change-cause: "Update to v2"
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
description: "Main API service"
Let's see what you have learned so far
Sarah needs to pass database credentials and feature flags to her app. Kubernetes offers several ways to inject configuration.
env:
- name: DB_HOST
value: "postgres.default.svc"
- name: LOG_LEVEL
value: "info"
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
- name: APP_CONFIG
valueFrom:
configMapKeyRef:
name: app-config
key: config.json
secretKeyRef.
ConfigMaps decouple configuration from container images. Change config without rebuilding.
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
DATABASE_URL: "postgres://db.default.svc:5432/mydb"
LOG_LEVEL: "info"
config.yaml: |
server:
port: 8080
timeout: 30s
envFrom:
- configMapRef:
name: app-config
volumes:
- name: cfg
configMap:
name: app-config
env:
- name: LOG_LEVEL
valueFrom:
configMapKeyRef:
name: app-config
key: LOG_LEVEL
Sarah's app needs a database password. She knows better than to put it in plain YAML...
# Create a Secret imperatively
kubectl create secret generic db-creds \
--from-literal=username=admin \
--from-literal=password=S3cureP@ss!
# Or declaratively (values must be base64-encoded)
apiVersion: v1
kind: Secret
metadata:
name: db-creds
type: Opaque
data:
username: YWRtaW4= # base64 of "admin"
password: UzNjdXJlUEBzcyE=
Sarah's Pod is eating all the memory on a shared node, affecting other teams. Time to set boundaries.
The scheduler uses requests to decide where to place the Pod. The node must have at least this much available.
resources:
requests:
memory: "128Mi"
cpu: "250m" # 0.25 CPU core
The kubelet enforces limits. Exceed memory limit = OOMKilled. Exceed CPU limit = throttled.
resources:
limits:
memory: "256Mi"
cpu: "500m" # 0.5 CPU core
Cluster admins use these to enforce resource governance across namespaces.
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
spec:
limits:
- default:
cpu: "500m"
memory: "256Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
type: Container
If a Pod doesn't specify resources, these defaults apply.
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
spec:
hard:
requests.cpu: "4"
requests.memory: "8Gi"
limits.cpu: "8"
limits.memory: "16Gi"
pods: "20"
The namespace cannot exceed these totals across all Pods.
Sarah's Pod is "Running" but users are getting 503 errors. The container started, but the app inside hasn't finished initializing. Kubernetes needs a way to know when the app is truly ready.
"Is the app alive?"
If it fails, kubelet restarts the container. Use this to catch deadlocks and unrecoverable states.
"Is the app ready for traffic?"
If it fails, the Pod is removed from Service endpoints. No traffic is sent until it passes again.
"Has the app finished starting?"
Disables liveness/readiness checks until it succeeds. Perfect for slow-starting apps (Java, .NET).
Three mechanisms to check health:
Most common for web apps
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 3
For non-HTTP services (databases, caches)
readinessProbe:
tcpSocket:
port: 3306
initialDelaySeconds: 5
periodSeconds: 10
Run a command inside the container
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
initialDelaySeconds should be >= your app's startup time. Set failureThreshold high enough to avoid false positives but low enough to detect real failures. For slow-starting apps, use a startup probe instead of a long initial delay.
What happens when a container exits? The restartPolicy field determines the behavior.
| Policy | Behavior | Use Case |
|---|---|---|
Always (default) | Always restart, regardless of exit code | Long-running services (web servers, APIs) |
OnFailure | Restart only on non-zero exit code | Jobs that should retry on failure |
Never | Never restart | One-shot tasks, debugging |
CrashLoopBackOff status when this is happening. Check logs with kubectl logs <pod> --previous.
Test your understanding of environment, probes, and resources
Sarah has 10 nodes in her cluster. When she creates a Pod, how does Kubernetes decide which node it lands on? Enter the kube-scheduler -- the matchmaker of the cluster.
The scheduler looks at each Pod's requirements (CPU, memory, affinity rules) and each node's capacity, then finds the best match -- like a housing algorithm matching tenants to apartments.
All of this happens in milliseconds. You can influence it but rarely need to override it.
Want your Pod on a specific type of node? Use nodeSelector -- it matches Pod to node labels.
# First, label your node
kubectl label node worker-1 disktype=ssd
# Then reference it in your Pod spec
apiVersion: v1
kind: Pod
metadata:
name: fast-io-app
spec:
nodeSelector:
disktype: ssd
containers:
- name: app
image: my-app:1.0
disktype=ssd. If no node matches, the Pod stays Pending.
For more complex placement logic, use node affinity (coming up next).
Node affinity is a more expressive version of nodeSelector. It supports soft preferences and complex expressions.
requiredDuringSchedulingIgnoredDuringExecution
Pod MUST be placed on a matching node. Like nodeSelector but with richer operators.
preferredDuringSchedulingIgnoredDuringExecution
Scheduler TRIES to match, but will place elsewhere if needed. Has a weight (1-100).
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values: ["us-east-1a", "us-east-1b"]
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80
preference:
matchExpressions:
- key: node-type
operator: In
values: ["high-memory"]
Operators: In, NotIn, Exists, DoesNotExist, Gt, Lt
Sarah wants her web Pods close to her cache Pods (for low latency) but spread across zones (for high availability). Pod affinity and anti-affinity solve both.
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: redis
topologyKey: kubernetes.io/hostname
Place this Pod on the same node as Pods labeled app=redis.
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: web
topologyKey: topology.kubernetes.io/zone
Try to spread web Pods across different zones.
kubernetes.io/hostname = per node, topology.kubernetes.io/zone = per zone, topology.kubernetes.io/region = per region.
Some nodes in Sarah's cluster are reserved for GPU workloads only. How does she keep regular Pods off those nodes?
Taints are like a "VIP Only" sign on a node -- regular Pods are repelled. Tolerations are like a VIP pass on a Pod -- allowing it through. Taints go on nodes; tolerations go on Pods.
# Apply a taint
kubectl taint nodes gpu-node-1 \
gpu=true:NoSchedule
# Remove a taint (note the minus)
kubectl taint nodes gpu-node-1 \
gpu=true:NoSchedule-
spec:
tolerations:
- key: "gpu"
operator: "Equal"
value: "true"
effect: "NoSchedule"
Three effects determine how strictly the taint is enforced:
| Effect | Behavior | Existing Pods? |
|---|---|---|
NoSchedule | New Pods without a toleration will NOT be scheduled here | Not affected |
PreferNoSchedule | Scheduler TRIES to avoid this node (soft version) | Not affected |
NoExecute | New Pods rejected AND existing Pods without toleration are evicted | Evicted! |
node-role.kubernetes.io/control-plane:NoSchedulenode.kubernetes.io/unreachable:NoExecuteSarah's Pods keep landing on the same node. If that node goes down, all replicas are lost. She needs to spread them evenly.
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: web
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: web
DoNotSchedule (hard) or ScheduleAnyway (soft)Topology spread constraints give you finer control over how evenly Pods are distributed. Anti-affinity is more binary -- "don't put me near X".
How well do you understand Pod placement?
In rare cases, you can bypass the scheduler entirely by setting nodeName directly.
apiVersion: v1
kind: Pod
metadata:
name: manual-pod
spec:
nodeName: worker-node-3 # Bypasses the scheduler entirely
containers:
- name: app
image: nginx:1.25
nodeName in production. It bypasses all scheduling logic: no resource checks, no affinity rules, no taints. If the named node doesn't exist or is down, the Pod will never run. Use nodeSelector or affinity instead.
Sarah notices some Pods in kube-system that she can't delete through kubectl. These are static Pods -- managed directly by the kubelet on each node.
/etc/kubernetes/manifests/)kubectl get podskube-apiserverkube-controller-managerkube-scheduleretcdThis is how kubeadm bootstraps the control plane -- the kubelet starts these before the API server is even running.
--pod-manifest-path in kubelet config or staticPodPath in the kubelet config file. Check with ps aux | grep kubelet.
The cluster is full and Sarah's critical payment service can't get scheduled. But there are lower-priority batch jobs running. Priority classes to the rescue.
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
preemptionPolicy: PreemptLowerPriority
description: "For critical production workloads"
# Reference in Pod spec
spec:
priorityClassName: high-priority
containers:
- name: payment-service
image: payment:2.0
system-cluster-critical (2000000000) and system-node-critical (2000001000).
The ops team is draining nodes for maintenance. Sarah's API has 3 replicas, and she wants to guarantee at least 2 are always available during the disruption.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 2 # OR use maxUnavailable: 1
selector:
matchLabels:
app: my-api
Minimum number (or percentage) of Pods that must remain available during voluntary disruptions.
Maximum number (or percentage) of Pods that can be unavailable. Use one or the other, not both.
Applies to voluntary disruptions only: node drains, rolling updates, cluster autoscaler. Does NOT protect against involuntary disruptions like hardware failures.
Sarah's team has been bitten by poorly configured probes. Here are the lessons they learned.
initialDelaySecondsinitialDelaySeconds too low for the app's actual start time/healthz for liveness (am I alive?), /readyz for readiness (am I ready for traffic?). The liveness check is internal-only; the readiness check verifies dependencies.
Kubernetes lets you run code at two points in a container's lifecycle:
Runs immediately after the container is created (no guarantee it runs before the ENTRYPOINT).
Use case: register with a service, warm caches.
Runs before the container receives SIGTERM. Blocks the shutdown for its duration.
Use case: drain connections, deregister from load balancer.
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "echo 'Started' >> /var/log/lifecycle.log"]
preStop:
httpGet:
path: /shutdown
port: 8080
terminationGracePeriodSeconds (default 30s), then SIGKILL. Configure your app to handle SIGTERM gracefully.
Almost there -- test your advanced knowledge
kubectl drain), rolling updates, and cluster autoscaler scale-downs. They do not protect against involuntary disruptions like hardware failures or kernel panics.The security team reviewed Sarah's deployment and flagged that her containers are running as root. Time to lock things down.
spec:
securityContext: # Pod-level settings
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
containers:
- name: app
image: my-app:1.0
securityContext: # Container-level settings
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
add: ["NET_BIND_SERVICE"]
runAsUser, runAsGroup, fsGroup, runAsNonRoot, supplementalGroups
allowPrivilegeEscalation, readOnlyRootFilesystem, capabilities, privileged, seccompProfile
runAsNonRoot: true, allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, drop ALL capabilities, add only what you need.
Pods use ServiceAccounts to authenticate with the Kubernetes API. Every Pod gets one.
# Create a service account
kubectl create serviceaccount my-app-sa
# Use it in a Pod
spec:
serviceAccountName: my-app-sa
automountServiceAccountToken: false # Disable if not needed
containers:
- name: app
image: my-app:1.0
default one)automountServiceAccountToken: false unless the Pod needs API accessThings will go wrong. Here is the debugging playbook Sarah keeps handy.
# Check Pod status and events
kubectl describe pod my-api
# View container logs (current and previous crash)
kubectl logs my-api
kubectl logs my-api --previous
kubectl logs my-api -c sidecar # specific container
# Exec into a running container
kubectl exec -it my-api -- /bin/sh
# Debug with ephemeral container (K8s 1.25+)
kubectl debug -it my-api --image=busybox --target=app
# View resource usage
kubectl top pod my-api
# Check events in namespace
kubectl get events --sort-by=.lastTimestamp
ImagePullBackOff = wrong image name or no pull secret. CrashLoopBackOff = container keeps crashing, check logs. Pending = no node available, check resources/taints/affinity. OOMKilled = increase memory limit.
# Generate YAML without creating
kubectl run nginx --image=nginx \
--dry-run=client -o yaml
# Apply from file
kubectl apply -f pod.yaml
# Delete
kubectl delete pod my-api
kubectl delete -f pod.yaml
# Edit live resource
kubectl edit pod my-api
# List with extra info
kubectl get pods -o wide
# JSON output + jq
kubectl get pod my-api -o json | jq '.status'
# Watch for changes
kubectl get pods -w
# Sort by restart count
kubectl get pods --sort-by='.status.containerStatuses[0].restartCount'
alias k=kubectl. Use kubectl explain pod.spec.containers to browse API docs directly from the command line. Tab completion saves time: source <(kubectl completion bash).
After everything she learned, here is Sarah's production-ready Pod manifest. Notice how it incorporates labels, probes, resources, security, and more.
apiVersion: v1
kind: Pod
metadata:
name: my-api
namespace: production
labels:
app: my-api
tier: backend
version: v2
annotations:
prometheus.io/scrape: "true"
spec:
serviceAccountName: my-api-sa
securityContext:
runAsNonRoot: true
runAsUser: 1000
containers:
- name: api
image: my-api:2.0
ports:
- containerPort: 8080
resources:
requests: { cpu: "250m", memory: "128Mi" }
limits: { cpu: "500m", memory: "256Mi" }
livenessProbe:
httpGet: { path: /healthz, port: 8080 }
initialDelaySeconds: 10
periodSeconds: 15
readinessProbe:
httpGet: { path: /readyz, port: 8080 }
initialDelaySeconds: 5
periodSeconds: 5
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: env
operator: In
values: ["production"]
Sarah's production Pod is running beautifully. But she realizes something troubling...
"What happens if my Pod crashes? Or if I need to run 5 copies? Or update to a new version without downtime?"
Bare Pods can't do any of that. She needs something smarter -- something that manages Pods for her.
Next up: Workload Controllers -- Deployments, ReplicaSets, StatefulSets, and more.
Comprehensive review of Pods, Containers, and Scheduling
tolerations: [{key: "gpu", operator: "Equal", value: "true", effect: "NoSchedule"}]. What does this mean?Next: Module 04 -- Workload Controllers