Autoscaling, Traffic Splitting, and Production-Ready Services
From simple deployments to sophisticated traffic management -- mastering Knative Serving.
A Knative Service manages three child resources automatically:
Knative Service
/ | \
/ | \
Configuration | Route
| | |
Revision | Traffic Rules
(v1, v2..) | (% splits, tags)
|
Kubernetes
(Pods, Services)
The Configuration defines what your service looks like:
Each change to the Configuration creates a new Revision.
service-name-00001, service-name-00002, etc.template.metadata.nameThe Route controls how traffic reaches your Revisions:
@latest always points to the newest revision# Route URL pattern http://service-name.namespace.example.com # main http://tag-name-service-name.namespace.example.com # tagged
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: my-api
namespace: production
labels:
app: my-api
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "1"
autoscaling.knative.dev/maxScale: "10"
spec:
containers:
- image: myregistry.azurecr.io/my-api:v1.2.0
ports:
- containerPort: 8080
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-secret
key: url
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
1. What are the three child resources managed by a Knative Service?
2. What triggers the creation of a new Revision?
3. How do tagged revisions receive traffic?
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: my-api
spec:
template:
metadata:
name: my-api-v2
spec:
containers:
- image: myregistry.azurecr.io/my-api:v2.0.0
traffic:
- revisionName: my-api-v2
percent: 5
- revisionName: my-api-v1
percent: 95
# Start with 5% kn service update my-api \ --traffic my-api-v2=5 \ --traffic my-api-v1=95 # Monitor metrics... looks good! Increase to 25% kn service update my-api \ --traffic my-api-v2=25 \ --traffic my-api-v1=75 # Still good! Go to 50/50 kn service update my-api \ --traffic my-api-v2=50 \ --traffic my-api-v1=50 # Full rollout kn service update my-api \ --traffic my-api-v2=100 # Something went wrong? Instant rollback! kn service update my-api \ --traffic my-api-v1=100
# Blue is live (current)
# Deploy green with a tag (0% main traffic)
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: my-api
spec:
template:
metadata:
name: my-api-green
spec:
containers:
- image: myregistry.azurecr.io/my-api:v2.0.0
traffic:
- revisionName: my-api-blue
percent: 100
- revisionName: my-api-green
percent: 0
tag: green
Test green at http://green-my-api.ns.example.com, then switch 100% when ready.
Knative provides two autoscaler implementations:
Request ---> Queue-Proxy (sidecar in each pod)
|
Metrics reported
|
Autoscaler collects
concurrency/RPS data
|
Desired replicas =
total_concurrency / target_concurrency
|
Scale up or down
(including to zero)
The queue-proxy sidecar is the key -- it measures real request concurrency per pod.
# Key config in config-autoscaler ConfigMap enable-scale-to-zero: "true" # default: true scale-to-zero-grace-period: "30s" # default: 30s stable-window: "60s" # default: 60s
minScale: "1" for latency-critical services1. What metric does KPA (Knative Pod Autoscaler) primarily use for scaling?
2. What is the role of the queue-proxy sidecar?
3. Which autoscaler supports scale-to-zero?
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: my-api
spec:
template:
metadata:
annotations:
# Min and Max replicas
autoscaling.knative.dev/minScale: "2"
autoscaling.knative.dev/maxScale: "50"
# Target concurrency per pod
autoscaling.knative.dev/target: "100"
# Autoscaler class (kpa.autoscaling.knative.dev or hpa.autoscaling.knative.dev)
autoscaling.knative.dev/class: "kpa.autoscaling.knative.dev"
# Metric type: concurrency (default) or rps
autoscaling.knative.dev/metric: "concurrency"
# Scale down delay
autoscaling.knative.dev/scale-down-delay: "5m"
spec:
containers:
- image: myregistry.azurecr.io/my-api:v1
The target annotation controls how aggressively Knative scales:
| Target | Behavior | Use Case |
|---|---|---|
target: "1" | 1 request per pod at a time | Heavy processing, ML inference |
target: "10" | 10 concurrent requests per pod | Moderate API workloads |
target: "100" | 100 concurrent requests per pod | Light, fast endpoints |
Formula: desired_pods = total_concurrent_requests / target
With target=10 and 50 concurrent requests: 5 pods
# In config-autoscaler ConfigMap (knative-serving namespace) apiVersion: v1 kind: ConfigMap metadata: name: config-autoscaler namespace: knative-serving data: # Allow the autoscaler to create burst pods # beyond what metrics suggest (handles traffic spikes) allow-zero-initial-scale: "true" # Initial scale when a revision is first deployed initial-scale: "1" # Max burst capacity on scale-from-zero max-scale-up-rate: "1000" # Panic mode: scale aggressively if traffic spikes panic-window-percentage: "10.0" panic-threshold-percentage: "200.0"
Maximum concurrent requests the container can handle. Extra requests are queued by queue-proxy.
spec:
template:
spec:
containerConcurrency: 10
Set to 0 for unlimited (default).
The autoscaler's target. Drives scaling decisions. Does NOT limit actual concurrency.
annotations:
autoscaling.knative.dev/target: "10"
Use 70% of containerConcurrency as a recommended target.
Map your own domain to a Knative Service:
apiVersion: serving.knative.dev/v1beta1
kind: DomainMapping
metadata:
name: api.mycompany.com
namespace: production
spec:
ref:
name: my-api
kind: Service
apiVersion: serving.knative.dev/v1
# Or configure default domain in config-domain ConfigMap apiVersion: v1 kind: ConfigMap metadata: name: config-domain namespace: knative-serving data: mycompany.com: "" # All services get *.mycompany.com
Secure your Knative services with automatic TLS:
# Option 1: Use cert-manager for automatic certificates # Install cert-manager, then configure Knative: apiVersion: v1 kind: ConfigMap metadata: name: config-network namespace: knative-serving data: auto-tls: "Enabled" http-protocol: "Redirected" # Redirect HTTP -> HTTPS certificate-class: "cert-manager.io"
# Option 2: Bring your own certificate
kubectl create secret tls my-tls-cert \
--key=tls.key --cert=tls.crt -n production
# Reference in DomainMapping
spec:
tls:
secretName: my-tls-cert
1. What is the difference between containerConcurrency and the autoscaling target annotation?
2. If you have a target of 10 and 80 concurrent requests, how many pods will the autoscaler create?
3. How do you enable automatic TLS for Knative services?
Not every service should be publicly accessible:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: internal-api
labels:
networking.knative.dev/visibility: cluster-local
spec:
template:
spec:
containers:
- image: myregistry.azurecr.io/internal-api:v1
http://internal-api.namespace.svc.cluster.local
spec:
template:
spec:
containers:
- image: myregistry.azurecr.io/my-api:v1
env:
# Direct value
- name: LOG_LEVEL
value: "info"
# From ConfigMap
- name: API_BASE_URL
valueFrom:
configMapKeyRef:
name: app-config
key: api-url
# From Secret
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
# All keys from a ConfigMap
envFrom:
- configMapRef:
name: feature-flags
spec:
template:
spec:
containers:
- image: myregistry.azurecr.io/my-api:v1
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 1000m
memory: 512Mi
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
Readiness probes are critical -- Knative uses them to know when to route traffic.
spec:
template:
spec:
containers:
- image: myregistry.azurecr.io/my-api:v1
volumeMounts:
- name: config-volume
mountPath: /etc/config
readOnly: true
- name: secret-volume
mountPath: /etc/secrets
readOnly: true
volumes:
- name: config-volume
configMap:
name: app-config
- name: secret-volume
secret:
secretName: app-secrets
Note: Knative does NOT support PersistentVolumeClaims -- services should be stateless. Use external storage (Azure Blob, databases) for persistent data.
spec:
template:
spec:
serviceAccountName: my-app-sa
imagePullSecrets:
- name: acr-credentials
containers:
- image: myregistry.azurecr.io/my-api:v1
For AKS with ACR integration:
# Attach ACR to AKS (no imagePullSecrets needed) az aks update \ --name myAKSCluster \ --resource-group myResourceGroup \ --attach-acr myACR
1. How do you make a Knative Service only accessible within the cluster?
2. Can Knative Services use PersistentVolumeClaims?
3. Why are readiness probes especially important in Knative?
| Scenario | minScale | target | Autoscaler |
|---|---|---|---|
| Dev/test workloads | 0 | Default (100) | KPA |
| Low-traffic APIs | 0 or 1 | 50-100 | KPA |
| Production APIs | 2+ | Based on load testing | KPA |
| CPU-heavy processing | 1+ | N/A (CPU metric) | HPA |
| ML inference | 1 | 1-5 | KPA |
| WebSocket services | 1+ | Low (5-10) | KPA |
# Quick health check kn service list kn revision list kn route list # Describe a service (see conditions, traffic, URLs) kn service describe my-api # Watch pods scale kubectl get pods -w -l serving.knative.dev/service=my-api # Check autoscaler decisions kubectl logs -n knative-serving -l app=autoscaler -f # Key metrics to watch: # - Revision ready latency (cold start time) # - Request concurrency per pod # - Response latency (p50, p95, p99) # - Scale-from-zero duration
In the next module, we explore Knative Eventing:
Module 3: Knative Eventing