Kubernetes Training — Presentation 8 of 8

Observability, Troubleshooting
& Exam Prep

It's 3 AM. Your phone rings. The application is down. How do you debug it? This is the most important skill in Kubernetes — and 30% of the CKA exam.

CKA: Troubleshooting — 30% CKAD: Application Observability — 15%
The 3 AM Scenario

When Things Go Wrong

"PagerDuty alert: 'Production API - 5xx error rate above 50%'. You open your laptop. The dashboard is red. Users are complaining on social media. Your manager is on Slack asking for an ETA. Where do you even start?"

Today we build the skills to answer that question confidently. By the end of this session, you will have a systematic approach to debugging any Kubernetes issue — and the exam skills to prove it.

Overview

Our Journey Today

Observability

  1. Logging — the black box recorder
  2. Monitoring — the dashboard
  3. Azure Monitor for AKS

Troubleshooting

  1. Debugging Pods — CrashLoopBackOff
  2. Debugging Services — connectivity
  3. Common failure scenarios
  4. Systematic debugging flow

Exam & Beyond

  1. Helm — package management
  2. CRDs & Operators
  3. CKA exam strategy
  4. CKAD exam strategy
  5. kubectl aliases & speed tricks
Logging

Logging: Reading the Black Box Recorder

In aviation, when something goes wrong, investigators look at the black box. In Kubernetes, that black box is your container logs. Everything your application writes to stdout/stderr is captured by the kubelet.

How K8s Logging Works

  • Containers write to stdout and stderr
  • Container runtime captures output
  • Stored as files on the node: /var/log/containers/
  • Rotated by kubelet (default 10Mi, 5 files)
  • Lost when pod is deleted (no persistence!)

Key Principle

Applications in Kubernetes should always log to stdout/stderr, never to files inside the container. This is one of the 12-Factor App principles.

If your legacy app writes to /var/log/app.log, use a sidecar container to stream that file to stdout.

Logging

kubectl logs — Your First Debugging Tool

# Basic: logs from a single pod
kubectl logs my-pod

# Specific container in a multi-container pod
kubectl logs my-pod -c sidecar

# Follow logs in real-time (like tail -f)
kubectl logs my-pod -f

# Last 100 lines only
kubectl logs my-pod --tail=100

# Logs from the last hour
kubectl logs my-pod --since=1h

# Logs from a PREVIOUS crashed container (critical for CrashLoopBackOff!)
kubectl logs my-pod --previous

# Logs from all pods matching a label
kubectl logs -l app=api --all-containers

# Logs from a specific deployment's pods
kubectl logs deployment/api-server --tail=50
💡
Exam Must-Know: kubectl logs pod-name --previous shows logs from the last crashed container instance. This is essential for debugging CrashLoopBackOff.
Logging

Cluster-Level Log Aggregation

Node-level logs are ephemeral. For production, you need centralized log aggregation that survives pod restarts and node failures.

Node-Level Agent

DaemonSet on every node collects logs and forwards them.

  • Fluentd / Fluent Bit
  • Filebeat
  • Azure Monitor Agent

Most common pattern

Sidecar Container

Dedicated container in each pod streams logs.

  • For apps that write to files
  • Higher resource cost
  • More control per-app

Use when node agent isn't enough

Direct Push

Application pushes logs directly to backend.

  • Application Insights SDK
  • Custom logging libraries
  • No K8s involvement

Least common in K8s

Monitoring

Monitoring: The Dashboard That Tells You Everything

Logs tell you what happened. Monitoring tells you what's happening right now. It's the difference between reading a crash report and watching the speedometer.

Metrics Server (Built-in)

  • Lightweight, in-cluster metrics aggregator
  • Provides CPU and memory metrics
  • Powers kubectl top commands
  • Powers HorizontalPodAutoscaler
  • Does NOT store historical data
# Node resource usage
kubectl top nodes

# Pod resource usage
kubectl top pods -n production
kubectl top pods --sort-by=memory

Prometheus (Industry Standard)

  • Pull-based metrics collection
  • Time-series database
  • PromQL query language
  • Alerting via Alertmanager
  • Grafana for visualization

The Prometheus + Grafana stack is the de facto standard for Kubernetes monitoring.

Azure Integration

Azure Monitor for AKS

Azure provides managed observability for AKS clusters without the overhead of running your own Prometheus/Grafana stack.

Container Insights

  • Agent-based log and metric collection
  • Pre-built dashboards for clusters, nodes, pods
  • KQL (Kusto) queries for deep analysis
  • Integration with Azure Alerts
  • Live container logs in the portal

Azure Managed Prometheus + Grafana

  • Fully managed Prometheus-compatible metrics
  • Azure Managed Grafana for dashboards
  • PromQL support
  • No infrastructure to manage
  • Built-in recording rules and alerts
# Enable Container Insights on an existing AKS cluster
az aks enable-addons -a monitoring -n myCluster -g myResourceGroup

# Query container logs via KQL
ContainerLog | where LogEntry contains "error" | project TimeGenerated, LogEntry | top 50
Knowledge Check

Quiz: Observability Fundamentals

Q1: Which command shows logs from a previously crashed container instance?

kubectl logs pod-name --crashed
kubectl logs pod-name --previous
kubectl logs pod-name --last
kubectl describe pod pod-name
Correct: kubectl logs pod-name --previous (or -p). This retrieves logs from the previous container instance, which is critical for debugging CrashLoopBackOff.

Q2: What component must be installed for kubectl top to work?

Prometheus
Metrics Server
cAdvisor
Grafana
Correct: Metrics Server. It provides the metrics API that kubectl top reads from. cAdvisor provides container metrics to kubelet, but kubectl top needs Metrics Server to aggregate them.

Q3: Where should Kubernetes applications write their logs?

/var/log/application.log
stdout and stderr
A shared PersistentVolume
Directly to Elasticsearch
Correct: stdout and stderr. This follows the 12-Factor App methodology and allows the container runtime and kubectl logs to capture output. File-based logging requires sidecar containers to forward.
Troubleshooting

Debugging Pods: A Story-Driven Approach

"You deploy your application. Instead of Running, the pod shows CrashLoopBackOff. You wait. It restarts. Crashes again. Restarts. Crashes again. The back-off delay grows: 10s, 20s, 40s, 80s... up to 5 minutes. Users are waiting."

Let's walk through exactly how to investigate this, step by step.

Troubleshooting

Step 1: Get the Status

Always start with kubectl get to understand the current state. The STATUS column tells you what phase the pod is in.

# See all pods and their status
kubectl get pods -n production
NAME          READY   STATUS             RESTARTS   AGE
api-server    0/1     CrashLoopBackOff   5          3m
web-frontend  1/1     Running            0          1h
db-worker     0/1     Pending            0          10m
StatusMeaningNext Step
CrashLoopBackOffContainer starts and crashes repeatedlykubectl logs --previous
PendingCan't be scheduledkubectl describe pod
ImagePullBackOffCan't pull container imageCheck image name, registry access
Init:0/1Init container hasn't completedkubectl logs pod -c init-container
RunningAll containers startedCheck readiness probes, service
Troubleshooting

Step 2: Describe the Pod

kubectl describe is your detective report. It shows events, conditions, and the full pod specification. Focus on the Events section at the bottom.

kubectl describe pod api-server -n production

# Key sections to examine:

# 1. Status & Conditions
Status:       Running
Conditions:
  Ready:      False    # <-- readiness probe failing

# 2. Container State
State:        Waiting
  Reason:     CrashLoopBackOff
Last State:   Terminated
  Reason:     Error
  Exit Code:  1        # <-- non-zero = crash

# 3. Events (most important!)
Events:
  Warning  BackOff   kubelet  Back-off restarting failed container
  Normal   Pulled    kubelet  Successfully pulled image
  Warning  Unhealthy kubelet  Readiness probe failed: connection refused
💡
Exit codes: 0 = success, 1 = application error, 137 = OOMKilled (128 + 9/SIGKILL), 139 = segfault, 143 = SIGTERM (graceful shutdown).
Troubleshooting

Step 3: Read the Logs

Events tell you what Kubernetes sees. Logs tell you what the application sees. For a crashing container, use --previous to see the last run's output.

# Logs from the crashed container
kubectl logs api-server -n production --previous

# Example output that tells us the problem:
Starting API server on port 8080...
Connecting to database at postgres.db.svc:5432...
ERROR: FATAL: password authentication failed for user "admin"
ERROR: Cannot connect to database. Exiting.

The answer is clear: the database password is wrong. The fix might be updating a Secret or ConfigMap.

⚠️
Common pitfall: If logs show nothing (container crashes instantly), the problem might be the command/entrypoint. Use kubectl describe to check the container's command, args, and image.
Troubleshooting

Step 4: Get Inside the Container

Sometimes you need to poke around inside a running container or test connectivity from within the cluster.

# Exec into a running container
kubectl exec -it api-server -n production -- /bin/sh

# Run a specific command without interactive shell
kubectl exec api-server -- cat /etc/config/settings.conf

# For crashed/minimal containers, use ephemeral debug containers (K8s 1.25+)
kubectl debug -it api-server --image=busybox --target=api-server

# Spin up a temporary debug pod in the same namespace
kubectl run debug --image=busybox --rm -it --restart=Never -- sh

# Test DNS resolution from inside the cluster
kubectl run dns-test --image=busybox --rm -it --restart=Never -- \
  nslookup api-service.production.svc.cluster.local

# Test connectivity to a service
kubectl run curl-test --image=curlimages/curl --rm -it --restart=Never -- \
  curl -v http://api-service.production:8080/health
Troubleshooting

Debugging Services: Where's the Break in the Chain?

Users can't reach your app. The chain is: User → Ingress → Service → Endpoints → Pod. Check each link.

# 1. Does the Service exist and have the right selector?
kubectl get svc api-service -n production -o wide

# 2. Does the Service have endpoints? (Most common issue!)
kubectl get endpoints api-service -n production
# If ENDPOINTS is <none>, the selector doesn't match any running pods

# 3. Check that pod labels match the service selector
kubectl get pods -n production --show-labels
kubectl get svc api-service -n production -o jsonpath='{.spec.selector}'

# 4. Is the pod actually listening on the right port?
kubectl exec api-server -- netstat -tlnp
# or
kubectl exec api-server -- ss -tlnp

# 5. Test from within the cluster
kubectl run test --rm -it --image=curlimages/curl --restart=Never -- \
  curl http://api-service.production:8080
💡
Top 3 service issues: (1) No endpoints — label mismatch, (2) Wrong port — targetPort doesn't match container port, (3) Pod not Ready — readiness probe failing.
Knowledge Check

Quiz: Debugging Fundamentals

Q1: A pod's exit code is 137. What does this mean?

Application error (uncaught exception)
OOMKilled (out of memory, killed by SIGKILL)
Image pull failure
Graceful shutdown via SIGTERM
Correct: OOMKilled. Exit code 137 = 128 + 9 (SIGKILL). The Linux kernel killed the process because it exceeded its memory limit. Fix by increasing the memory limit or fixing a memory leak.

Q2: A Service shows <none> for endpoints. What is the most likely cause?

The Service port is wrong
The Service selector doesn't match any running pod labels
The cluster DNS is down
Network policies are blocking traffic
Correct: The Service selector doesn't match any running pod labels. Endpoints are populated by the Endpoints controller which watches for pods matching the Service's selector. Check labels with kubectl get pods --show-labels.

Q3: Which command lets you debug a crashed container that has no shell binary?

kubectl exec -it pod-name -- /bin/sh
kubectl debug -it pod-name --image=busybox --target=container-name
kubectl attach pod-name
kubectl cp debug-tools pod-name:/usr/bin/
Correct: kubectl debug with an ephemeral container. This injects a debug container (like busybox) that shares the process namespace with the target container, letting you inspect it even if the original has no shell.
Failure Scenarios

Scenario 1: Pod Stuck in Pending

A Pending pod means the scheduler cannot find a suitable node. Here's the investigation flow:

kubectl describe pod stuck-pod -n production

# Common events you'll see:
Events:
  Warning  FailedScheduling  0/3 nodes are available:
    1 Insufficient cpu           # Node doesn't have enough CPU
    2 node(s) had taints that the pod didn't tolerate  # Missing toleration

Common Causes

  • Insufficient resources — no node has enough CPU/memory
  • Taints/Tolerations — all nodes tainted
  • Node selector/affinity — no matching nodes
  • PVC not bound — waiting for storage
  • ResourceQuota exceeded

Investigation Commands

# Check node capacity vs usage
kubectl top nodes
kubectl describe node node-1

# Check for taints
kubectl get nodes -o json | \
  jq '.items[].spec.taints'

# Check PVC status
kubectl get pvc -n production
Failure Scenarios

Scenario 2: CrashLoopBackOff

The container starts, crashes, and Kubernetes keeps restarting it with exponential back-off (10s, 20s, 40s... up to 5 minutes).

Common Causes

  • Application error — missing config, bad credentials, unhandled exception
  • Missing dependencies — database not reachable, required service down
  • OOMKilled — memory limit too low (exit code 137)
  • Bad command/entrypoint — wrong command in pod spec
  • Missing volume mounts — required config file not present
  • Liveness probe kills healthy container — probe misconfigured

Debugging Flowchart

  1. kubectl describe pod → check exit code
  2. kubectl logs --previous → read crash output
  3. Exit code 137? → increase memory limits
  4. Exit code 1? → fix application config
  5. No logs? → check command/args in describe
  6. Liveness probe? → check probe config and timing
Failure Scenarios

Scenario 3: ImagePullBackOff

Kubernetes cannot pull the container image. This is often one of the simplest issues to fix, but it can be confusing.

Common Causes

  • Typo in image namengnix instead of nginx
  • Tag doesn't existv2.0 hasn't been pushed yet
  • Private registry — no imagePullSecret configured
  • Registry down — Docker Hub rate limiting, ACR outage
  • Network policy — blocking egress to registry

Fix Checklist

# 1. Check the exact image reference
kubectl describe pod my-pod | grep Image

# 2. Can you pull it manually?
docker pull myregistry.azurecr.io/app:v2

# 3. Is imagePullSecret configured?
kubectl get pod my-pod -o jsonpath=\
'{.spec.imagePullSecrets}'

# 4. Create/fix pull secret
kubectl create secret docker-registry \
  acr-creds \
  --docker-server=myregistry.azurecr.io \
  --docker-username=user \
  --docker-password=pass
Troubleshooting

The Systematic Debugging Flow

When something is broken, follow this order. It covers 95% of issues:

  1. kubectl get pods -n NAMESPACE — What's the status?
  2. kubectl describe pod POD -n NAMESPACE — Events + conditions?
  3. kubectl logs POD -n NAMESPACE [--previous] — What did the app say?
  4. kubectl get events -n NAMESPACE --sort-by=.lastTimestamp — Cluster events?
  5. kubectl get svc,ep -n NAMESPACE — Service has endpoints?
  6. kubectl exec / kubectl debug — Can I test from inside?
  7. kubectl top pods/nodes — Resource pressure?
  8. kubectl get nodes — Any NotReady nodes?
💡
CKA Exam: Troubleshooting is 30% of the exam. Memorize this flow. Practice it until it becomes muscle memory.
Troubleshooting

Debugging Node Issues

If a node shows NotReady, the kubelet may be down, or the node may have resource pressure.

# Check node status
kubectl get nodes
NAME     STATUS     ROLES          AGE   VERSION
node-1   Ready      control-plane  30d   v1.29.0
node-2   NotReady   <none>         30d   v1.29.0

# Detailed node conditions
kubectl describe node node-2

# Key conditions to check:
Conditions:
  MemoryPressure   True   # Node running out of memory
  DiskPressure     True   # Node running out of disk
  PIDPressure      False
  Ready            False  # Kubelet not reporting or unhealthy

# On the node itself (SSH or debug container):
systemctl status kubelet
journalctl -u kubelet -f
systemctl status containerd
⚠️
CKA Exam Scenario: You may be asked to fix a broken kubelet. Check: Is kubelet running? Is the config correct? Is the certificate valid? systemctl restart kubelet is often the fix after correcting a config issue.
Knowledge Check

Quiz: Troubleshooting Scenarios

Q1: A pod is Pending with event "0/3 nodes are available: 3 Insufficient memory". What should you do?

Restart the pod
Delete and recreate the namespace
Reduce the pod's memory request or add more nodes
Increase the pod's memory limit
Correct: The scheduler uses memory requests (not limits) to place pods on nodes. Either reduce the memory request to fit on existing nodes, or scale up the cluster to add more capacity.

Q2: Users can't reach your app via the Service. kubectl get endpoints shows the service has no endpoints. What should you check first?

That the pod labels match the Service selector
The cluster DNS configuration
The pod's security context
The Ingress controller logs
Correct: No endpoints means no pods match the Service selector. The most common cause is a label mismatch. Compare kubectl get svc -o yaml (check spec.selector) with kubectl get pods --show-labels.

Q3: A node shows NotReady. Which component should you check first on that node?

kube-proxy
kubelet
etcd
kube-scheduler
Correct: kubelet. The kubelet is responsible for reporting node status to the API server. A NotReady node usually means kubelet is stopped, crashed, or misconfigured. Check with: systemctl status kubelet.
Helm

Helm: The Package Manager for Kubernetes

Think of Helm like apt or npm for Kubernetes. Instead of managing dozens of YAML files, you install a "chart" that contains everything your application needs — deployments, services, configmaps, RBAC, all templated and versioned.

Key Concepts

  • Chart — a package of templated K8s manifests
  • Release — an installed instance of a chart
  • Repository — where charts are stored (like npm registry)
  • Values — configuration overrides (values.yaml)

Why Use Helm?

  • Templated manifests with variables
  • Version management and rollbacks
  • Dependency management between charts
  • Reusable across environments (dev/staging/prod)
  • Huge ecosystem (Artifact Hub)
Helm

Essential Helm Commands

# Add a repository
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

# Search for charts
helm search repo nginx
helm search hub wordpress        # search Artifact Hub

# Install a chart (creates a release)
helm install my-nginx bitnami/nginx -n web --create-namespace

# Install with custom values
helm install my-app ./my-chart -f production-values.yaml

# Override specific values on the command line
helm install my-app ./my-chart --set replicaCount=3,image.tag=v2

# List releases
helm list -n web

# Upgrade a release
helm upgrade my-nginx bitnami/nginx --set replicaCount=3

# Rollback to a previous revision
helm rollback my-nginx 1

# Uninstall a release
helm uninstall my-nginx -n web

# Show what would be installed (dry-run)
helm template my-app ./my-chart -f values.yaml
Helm

Helm Chart Structure

Directory Layout

my-chart/
  Chart.yaml          # metadata (name, version)
  values.yaml         # default config values
  templates/
    deployment.yaml   # templated manifests
    service.yaml
    ingress.yaml
    _helpers.tpl      # template helpers
    NOTES.txt         # post-install message
  charts/             # sub-chart dependencies

Template Example

# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}
spec:
  replicas: {{ .Values.replicaCount }}
  template:
    spec:
      containers:
      - name: app
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        ports:
        - containerPort: {{ .Values.service.port }}
CRDs & Operators

Custom Resource Definitions: Extending Kubernetes

Kubernetes ships with resources like Pods, Services, and Deployments. CRDs let you define your own resource types, making the API server understand application-specific concepts.

What CRDs Enable

  • Define custom resources (e.g., Database, Certificate)
  • Use kubectl to manage them like built-in resources
  • RBAC and admission control work on CRDs
  • Extend K8s without modifying its source code

Real-World Examples

  • cert-manager: Certificate, Issuer
  • Istio: VirtualService, Gateway
  • Prometheus: ServiceMonitor, PrometheusRule
  • ArgoCD: Application
  • Crossplane: RDSInstance, Bucket
# List all CRDs in the cluster
kubectl get crds

# Get custom resources
kubectl get certificates -n production
kubectl describe certificate my-tls -n production
CRDs & Operators

Operators: Automating Human Knowledge

An Operator is a CRD + a custom controller that encodes operational knowledge. Think of it as a robot sysadmin that watches your custom resources and takes action.

The Operator Pattern

  1. Define a CRD (e.g., PostgresCluster)
  2. Deploy a controller that watches for these resources
  3. Controller creates/manages pods, services, config
  4. Controller handles upgrades, backups, failover

The controller runs a reconciliation loop: "desired state in CRD vs actual state in cluster → take action to converge."

Example: Database Operator

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: my-db
spec:
  postgresVersion: 15
  instances:
  - replicas: 3
    dataVolumeClaimSpec:
      accessModes: [ReadWriteOnce]
      resources:
        requests:
          storage: 10Gi

The operator handles replication, failover, backups, and upgrades automatically.

Knowledge Check

Quiz: Helm & CRDs

Q1: What is the Helm command to rollback a release to a previous version?

helm undo release-name
helm rollback release-name REVISION
helm revert release-name --to=previous
helm upgrade release-name --rollback
Correct: helm rollback release-name REVISION. Each helm install/upgrade creates a numbered revision. Use helm history release-name to see all revisions, then rollback to a specific one.

Q2: What does an Operator consist of?

A Helm chart and a values file
A Custom Resource Definition (CRD) and a custom controller
A DaemonSet and a ConfigMap
An Admission Webhook and a Service
Correct: An Operator combines a CRD (to define the desired state of a custom resource) with a controller (that watches those resources and takes action to achieve the desired state). This is the Operator Pattern.

Q3: Which Helm command shows the rendered YAML without installing it?

helm lint my-chart
helm show values my-chart
helm template my-release my-chart
helm install --debug my-release my-chart
Correct: helm template renders chart templates locally and displays the output YAML without sending anything to the cluster. Great for reviewing what Helm will create before installing. helm install --dry-run also works but requires cluster access.
CKA Exam

CKA: Certified Kubernetes Administrator

Exam Format

Duration2 hours
Questions15-20 performance-based tasks
Passing Score66%
EnvironmentReal cluster(s) via PSI browser
Resourceskubernetes.io/docs, helm.sh/docs, github.com/kubernetes allowed
Cost$395 USD (includes 1 retake)
Validity2 years

Domain Weights

Troubleshooting30%
Cluster Architecture25%
Workloads & Scheduling15%
Services & Networking20%
Storage10%

Troubleshooting is the largest domain. Master the debugging flow from earlier in this session.

CKA Exam

CKA: Strategy & Time Management

Time Management

  • Read all questions first (2 min). Flag easy ones.
  • Easy questions first — bank quick points
  • Budget ~6 min per question (15 questions = 90 min)
  • Skip and return if stuck after 8 minutes
  • Leave 10 min at the end for review
  • Check which cluster/context each question uses!

Critical Skills to Practice

  • Create pods, deployments, services imperatively
  • Fix broken kubelet, etcd, scheduler configs
  • RBAC: create roles and bindings fast
  • Network Policies from scratch
  • Upgrade a cluster with kubeadm
  • etcd backup and restore
  • Debug Pending/CrashLoopBackOff pods
💡
Pro tip: Always start with kubectl config use-context for each question. Working on the wrong cluster is the #1 reason people lose points.
CKA Exam

CKA: Practice Scenarios

Cluster Maintenance

  1. Upgrade the control plane from v1.28 to v1.29 using kubeadm
  2. Drain a node, perform maintenance, uncordon it
  3. Back up etcd to /opt/etcd-backup.db
  4. Restore etcd from a backup file
# etcd backup
ETCDCTL_API=3 etcdctl snapshot save \
  /opt/etcd-backup.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

Troubleshooting Tasks

  1. A node is NotReady. Fix the kubelet.
  2. A pod can't reach a service. Fix the NetworkPolicy.
  3. The scheduler is not running. Fix the static pod manifest.
  4. Create a user certificate and RBAC binding.
# Common kubelet fix
ssh node-2
systemctl status kubelet
# Check for config errors
journalctl -u kubelet | tail -50
# Fix config and restart
systemctl restart kubelet
CKAD Exam

CKAD: Certified Kubernetes Application Developer

Exam Format

Duration2 hours
Questions15-20 performance-based tasks
Passing Score66%
FocusApplication development, not cluster admin
ResourcesSame as CKA (kubernetes.io/docs)

Domain Weights

Application Design & Build20%
Application Deployment20%
Application Observability & Maintenance15%
Application Environment, Config & Security25%
Services & Networking20%
CKAD Exam

CKAD: Speed Tricks & Strategy

The CKAD is about speed. You need to create resources fast without writing YAML from scratch.

Generate YAML, Don't Write It

# Generate pod YAML
kubectl run nginx --image=nginx \
  --dry-run=client -o yaml > pod.yaml

# Generate deployment YAML
kubectl create deploy app \
  --image=app:v1 --replicas=3 \
  --dry-run=client -o yaml > deploy.yaml

# Generate service YAML
kubectl expose deploy app \
  --port=80 --target-port=8080 \
  --dry-run=client -o yaml > svc.yaml

# Generate job YAML
kubectl create job backup \
  --image=busybox \
  --dry-run=client -o yaml > job.yaml

CKAD Focus Areas

  • Multi-container pods — sidecar, init containers
  • Probes — liveness, readiness, startup
  • ConfigMaps & Secrets — creation and consumption
  • Resource limits — requests/limits
  • Rolling updates — strategy, rollback
  • Jobs & CronJobs
  • Network Policies
  • Ingress rules
  • SecurityContext
  • Helm basics (install, upgrade, rollback)
Exam Cheat Sheet

kubectl Aliases & Speed Shortcuts

Set these up at the start of your exam. They save minutes over the 2-hour session.

# Essential aliases (add to ~/.bashrc at exam start)
alias k=kubectl
alias kgp='kubectl get pods'
alias kgs='kubectl get svc'
alias kgd='kubectl get deploy'
alias kgn='kubectl get nodes'
alias kd='kubectl describe'
alias kdp='kubectl describe pod'
alias kaf='kubectl apply -f'
alias kdf='kubectl delete -f'
alias kex='kubectl exec -it'
alias klo='kubectl logs'
alias klof='kubectl logs -f'

# Enable auto-completion (usually pre-configured)
source <(kubectl completion bash)
complete -o default -F __start_kubectl k

# Quick context switch
alias kcc='kubectl config current-context'
alias kuc='kubectl config use-context'

# The most important shortcut: generate YAML
export do='--dry-run=client -o yaml'
# Usage: k run nginx --image=nginx $do > pod.yaml
Exam Cheat Sheet

kubectl Power Moves for the Exam

Quick Resource Creation

# Pod with command
k run busybox --image=busybox \
  --restart=Never -- sleep 3600

# Service for existing deployment
k expose deploy app --port=80 --type=ClusterIP

# ConfigMap from literal
k create cm my-config \
  --from-literal=key=value

# Secret from literal
k create secret generic my-secret \
  --from-literal=pass=s3cret

# Role + RoleBinding
k create role pod-reader \
  --verb=get,list --resource=pods -n dev
k create rolebinding read-pods \
  --role=pod-reader --user=jane -n dev

Quick Debugging

# Force delete a stuck pod
k delete pod stuck --grace-period=0 --force

# Get all events sorted by time
k get events --sort-by=.lastTimestamp

# Get pod IPs
k get pods -o wide

# JSON path for specific field
k get pod my-pod \
  -o jsonpath='{.status.podIP}'

# All images in namespace
k get pods -o jsonpath=\
'{.items[*].spec.containers[*].image}'

# Watch pods in real-time
k get pods -w

# Replace a resource quickly
k get pod my-pod -o yaml > tmp.yaml
# edit tmp.yaml
k replace -f tmp.yaml --force
Knowledge Check

Quiz: CKA/CKAD Exam Prep

Q1: What percentage of the CKA exam is dedicated to Troubleshooting?

15%
20%
30%
40%
Correct: 30%. Troubleshooting is the single largest domain in the CKA exam. It covers debugging nodes, pods, services, networking, and cluster components.

Q2: What is the passing score for both CKA and CKAD exams?

50%
66%
70%
75%
Correct: 66%. Both CKA and CKAD require a minimum score of 66% to pass. The exam includes a free retake if you fail the first attempt.

Q3: Which kubectl flag generates YAML without creating the resource?

--output=yaml
--dry-run=client -o yaml
--generate-yaml
--template=yaml
Correct: --dry-run=client -o yaml. This creates the resource definition client-side without sending it to the API server, and outputs it as YAML. Essential for quickly generating templates during the exam.
Exam Resources

Study Resources & Practice Labs

Free Resources

  • kubernetes.io/docs — official docs (allowed in exam)
  • killer.sh — free exam simulator (included with exam purchase)
  • KodeKloud free labs — CKA/CKAD challenges
  • kubectl cheat sheet — kubernetes.io/docs/reference/kubectl/cheatsheet
  • GitHub: dgkanatsios/CKAD-exercises
  • Play with Kubernetes — browser-based cluster

Paid Courses

  • KodeKloud CKA/CKAD — hands-on labs, best for beginners
  • Linux Foundation courses — official training
  • Udemy: Mumshad Mannambeth — highly rated CKA course
  • A Cloud Guru — structured learning path

Practice on killer.sh at least twice before the exam. It's harder than the real exam, which is exactly what you want.

Exam Cheat Sheet

Bookmark These Docs Pages

During the exam, you can access kubernetes.io/docs. Knowing where to find things saves critical time.

TopicDocs Location
kubectl Cheat Sheetkubernetes.io/docs/reference/kubectl/cheatsheet/
Pod Spec Referencekubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/
Network Policieskubernetes.io/docs/concepts/services-networking/network-policies/
RBACkubernetes.io/docs/reference/access-authn-authz/rbac/
etcd Backup/Restorekubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/
kubeadm Upgradekubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
Persistent Volumeskubernetes.io/docs/concepts/storage/persistent-volumes/
Security Contextkubernetes.io/docs/tasks/configure-pod-container/security-context/
💡
Exam Tip: Use the search bar on kubernetes.io — it's fast. But having mental bookmarks of key pages is even faster.
Final Knowledge Check

Quiz: Final Comprehensive Review

Q1: You need to check if a specific ServiceAccount can create deployments. Which command do you use?

kubectl get rolebinding --serviceaccount=ns:sa
kubectl auth can-i create deployments --as=system:serviceaccount:ns:sa
kubectl describe sa sa-name -n ns
kubectl auth check sa-name --verb=create --resource=deployments
Correct: kubectl auth can-i create deployments --as=system:serviceaccount:ns:sa. The --as flag impersonates the service account. Note the format: system:serviceaccount:NAMESPACE:SA_NAME.

Q2: Your pod is running but the readiness probe is failing. What happens?

The pod is killed and restarted
The pod is evicted from the node
The pod is removed from Service endpoints (no traffic routed to it)
Nothing — readiness probes are informational only
Correct: The pod is removed from Service endpoints, so no new traffic is routed to it. The pod continues running (unlike liveness probe failure, which restarts the container). This is the key difference between readiness and liveness probes.

Q3: During the CKA exam, which documentation sites are you allowed to access?

Only kubernetes.io/docs
kubernetes.io/docs, kubernetes.io/blog, helm.sh/docs, and github.com/kubernetes
Any website
No external resources are allowed
Correct: You can access kubernetes.io (docs, blog), helm.sh/docs, and github.com/kubernetes. No other websites, no personal notes, no Stack Overflow. Practice navigating these docs efficiently before the exam.
Course Recap

The Full Journey: All 8 Sessions

Sessions 1-4: Foundations

  1. K8s Fundamentals — architecture, pods, kubectl
  2. Workload Resources — deployments, scaling, updates
  3. Networking — services, ingress, DNS, network policies
  4. Storage — volumes, PV/PVC, StorageClasses

Sessions 5-8: Advanced

  1. Scheduling & Lifecycle — affinity, taints, probes
  2. Advanced Patterns — multi-container, jobs, operators
  3. Config, Security & RBAC — secrets, policies, access control
  4. Observability & Exam Prep — debugging, Helm, CKA/CKAD

You now have the knowledge to deploy, manage, secure, and troubleshoot Kubernetes workloads — and pass the certification exams.

Course Recap

Key Takeaways

Think Declaratively

Define desired state, let Kubernetes reconcile. This applies to everything: deployments, RBAC, network policies, operators.

Labels Are Everything

Services find pods via labels. Network policies select pods via labels. Scheduling uses labels. Get your labeling strategy right.

Security Is Not Optional

Run as non-root, set RBAC from day one, use network policies, scan images. Retro-fitting security is painful.

Master Debugging

get → describe → logs → exec. This flow solves 95% of issues. Practice until it's muscle memory.

Use kubectl Efficiently

Imperative commands for speed, declarative YAML for production. --dry-run=client -o yaml bridges the gap.

Keep Learning

Kubernetes evolves fast. Follow release notes, practice with new features, and stay connected to the community.

What's Next

Your Next Steps

This Week

  • Set up a practice cluster (kind, minikube, or AKS)
  • Deploy a real app end-to-end
  • Practice the debugging flow on broken pods
  • Schedule your exam date (accountability!)

Before the Exam

  • Complete 2 full killer.sh practice sessions
  • Master all imperative kubectl commands
  • Practice RBAC, NetworkPolicy, and etcd backup
  • Time yourself: 15 questions in 2 hours
Kubernetes Training Complete

Thank You & Good Luck!

You've invested significant time and effort in learning Kubernetes. Now it's time to put that knowledge into practice. Go build things, break things, fix things, and earn that certification.

CKA CKAD You've Got This
← Back