Module 03

Sync Strategies & Hooks

Controlling when, how, and in what order ArgoCD deploys your resources

Use arrow keys or click sides to navigate

The Story

The Midnight Incident

A junior developer runs kubectl edit on a production deployment, bumping replicas from 3 to 10. Nobody notices until the cloud bill arrives. With ArgoCD self-heal, the change would have been reverted within minutes. But the team also needs database migrations to run before app deployments, and old resources should be cleaned up automatically. Let us learn how ArgoCD handles all of this.

Manual vs Auto Sync

Manual Sync

ArgoCD detects drift and shows "OutOfSync" but waits for human approval before applying changes.

Good for production environments
Requires UI click or CLI command
Change review before deploy

Auto Sync

ArgoCD automatically applies changes when it detects drift between Git and the cluster.

Great for dev/staging environments
True GitOps -- push to Git, auto deploy
Combine with self-heal for full automation

Enabling Auto-Sync

spec:
  syncPolicy:
    automated:
      prune: false        # Do NOT auto-delete removed resources (safe default)
      selfHeal: false     # Do NOT auto-revert manual changes (safe default)
      allowEmpty: false   # Fail if Git path produces zero resources

Careful: Enabling automated without prune and selfHeal means ArgoCD will auto-sync new commits from Git but will NOT delete removed resources and will NOT revert manual cluster changes. This is the safest starting point.

Self-Heal: ArgoCD Fights Back

Self-heal automatically reverts any manual changes made directly to the cluster.

spec:
  syncPolicy:
    automated:
      selfHeal: true      # Revert manual cluster changes

Imagine someone runs kubectl scale deployment/api --replicas=10 on production. With self-heal enabled, ArgoCD detects the drift within seconds and scales it back to whatever Git says. The cluster always matches Git -- no exceptions.

Tip: Self-heal checks happen every 5 seconds (not the 3-minute polling interval). It uses a live Kubernetes watch for near-instant detection.

Prune: Automatic Cleanup

Prune automatically deletes Kubernetes resources that were removed from Git.

spec:
  syncPolicy:
    automated:
      prune: true         # Delete resources removed from Git

Without Prune

If you remove a Service from Git, the Service stays in the cluster as an orphan. ArgoCD shows it as "OutOfSync" but does nothing.

With Prune

If you remove a Service from Git, ArgoCD automatically deletes it from the cluster on the next sync. Clean and tidy.

Quiz Time

Sync Basics

1. What happens when auto-sync is enabled but prune is false, and you remove a Deployment from Git?

Correct! Without prune enabled, ArgoCD will not delete resources removed from Git. The resource stays in the cluster but may show as out of sync.

2. How quickly does ArgoCD's self-heal detect manual cluster changes?

Correct! Self-heal uses Kubernetes API watches for near-instant detection (typically within 5 seconds), unlike Git polling which runs every 3 minutes.

3. What does `allowEmpty: false` protect against?

Correct! If the source path is empty (misconfigured path, broken Helm chart), and prune is enabled, ArgoCD would delete ALL existing resources. allowEmpty: false prevents this catastrophic scenario.

Sync Options

Fine-tune sync behavior with sync options. These can be set globally or per-resource via annotations.

spec:
  syncPolicy:
    syncOptions:
      - CreateNamespace=true        # Create ns if it doesn't exist
      - PrunePropagationPolicy=foreground  # Wait for dependents
      - PruneLast=true              # Prune after all creates/updates
      - Validate=true               # Validate manifests before apply
      - ServerSideApply=true        # Use server-side apply
      - ApplyOutOfSyncOnly=true     # Only apply changed resources
      - RespectIgnoreDifferences=true  # Skip ignored fields in sync

CreateNamespace

syncOptions:
  - CreateNamespace=true

By default, ArgoCD assumes the target namespace already exists. If it does not, the sync fails. CreateNamespace=true tells ArgoCD to create the namespace automatically before syncing resources.

This is one of the most commonly needed sync options, especially when using ApplicationSets to deploy to multiple environments.

ServerSideApply

syncOptions:
  - ServerSideApply=true

What: Uses Kubernetes Server-Side Apply instead of client-side kubectl apply
Why: Better conflict detection and field ownership tracking
When: Large CRDs (like Istio VirtualService), resources managed by multiple controllers
Benefit: Avoids the "metadata.annotations too long" error on large resources

Real-world fix: If you see "metadata.annotations: Too long" errors during sync, enabling ServerSideApply usually fixes it. It eliminates the kubectl.kubernetes.io/last-applied-configuration annotation.

ApplyOutOfSyncOnly

syncOptions:
  - ApplyOutOfSyncOnly=true

Without (Default)

ArgoCD applies all resources in the application on every sync, even if they have not changed. Safe but slow for large applications.

With ApplyOutOfSyncOnly

ArgoCD only applies resources that are actually out of sync. Much faster for applications with many resources.

Recommended for applications with 50+ resources to reduce API server load and speed up syncs.

Per-Resource Sync Options

Apply sync options to individual resources using annotations.

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  annotations:
    # Skip this resource during prune operations
    argocd.argoproj.io/sync-options: Prune=false

    # Use replace instead of apply (destructive but clean)
    # argocd.argoproj.io/sync-options: Replace=true

    # Skip validation for this resource
    # argocd.argoproj.io/sync-options: Validate=false

Use case: Annotate critical resources like PVCs with Prune=false so they are never accidentally deleted, even if removed from Git.

Advanced

Sync Waves: Ordered Deployment

Sync waves let you control the order in which resources are applied.

Wave -1: Namespace

→

Wave 0: ConfigMaps, Secrets

→

Wave 1: Deployments

→

Wave 2: Ingress

Resources in the same wave are applied together. ArgoCD waits for all resources in a wave to be healthy before proceeding to the next wave. Waves are processed from lowest to highest number.

Using Sync Waves

# Wave -1: Create namespace first
apiVersion: v1
kind: Namespace
metadata:
  name: my-app
  annotations:
    argocd.argoproj.io/sync-wave: "-1"

---
# Wave 0: ConfigMap (default wave, no annotation needed)
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  annotations:
    argocd.argoproj.io/sync-wave: "0"

---
# Wave 1: Deployment depends on ConfigMap
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  annotations:
    argocd.argoproj.io/sync-wave: "1"

Quiz Time

Sync Options & Waves

1. What does the sync option `ServerSideApply=true` fix?

Correct! Server-side apply eliminates the last-applied-configuration annotation that can exceed the annotation size limit on large CRDs.

2. When using sync waves, when does ArgoCD proceed to the next wave?

Correct! ArgoCD waits for all resources in a wave to report as healthy before moving to the next wave. If a resource in a wave fails, the sync stops.

3. How do you prevent a PVC from being pruned even when prune is globally enabled?

Correct! Per-resource annotations override global sync options. Adding Prune=false to the PVC annotation protects it from deletion while other resources can still be pruned.

Hooks

Resource Hooks

Resource hooks let you run Kubernetes resources (usually Jobs) at specific points during the sync lifecycle.

PreSync

→

Sync

→

PostSync

Skip

SyncFail

Hooks are annotated Kubernetes resources. They run at their designated phase and can be cleaned up automatically.

PreSync Hook

Runs before the main sync. Perfect for database migrations, schema changes, or pre-flight checks.

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: my-app:v1.2.3
          command: ["./migrate", "--up"]
      restartPolicy: Never
  backoffLimit: 3

Real-world use: Before deploying a new API version, run a database migration Job. If the migration fails, the sync stops and the new version is never deployed.

Sync Hook

Runs during the main sync, alongside your application resources.

apiVersion: batch/v1
kind: Job
metadata:
  name: seed-cache
  annotations:
    argocd.argoproj.io/hook: Sync
    argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
spec:
  template:
    spec:
      containers:
        - name: seed
          image: my-app:v1.2.3
          command: ["./seed-cache"]
      restartPolicy: Never

Sync hooks are less common than PreSync/PostSync. Use them when you need something to run in parallel with the main deployment.

PostSync Hook

Runs after all resources are synced and healthy. Great for notifications, smoke tests, or data seeding.

apiVersion: batch/v1
kind: Job
metadata:
  name: smoke-test
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: test
          image: my-app-tests:v1.2.3
          command: ["./run-smoke-tests"]
          env:
            - name: API_URL
              value: "http://api-service:8080/health"
      restartPolicy: Never
  backoffLimit: 1

SyncFail Hook

Runs only when a sync fails. Perfect for alerting and cleanup.

apiVersion: batch/v1
kind: Job
metadata:
  name: notify-failure
  annotations:
    argocd.argoproj.io/hook: SyncFail
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: notify
          image: curlimages/curl:latest
          command:
            - curl
            - -X
            - POST
            - -d
            - '{"text":"Sync FAILED for my-app!"}'
            - https://hooks.slack.com/services/T00/B00/xxx
      restartPolicy: Never

Hook Delete Policies

Control when hook resources (usually Jobs) are cleaned up.

HookSucceeded

Delete the hook resource after it succeeds. If it fails, it stays for debugging.

HookFailed

Delete the hook resource if it fails. Useful when you only want to keep successful runs.

BeforeHookCreation

Delete the previous hook resource before creating a new one. Prevents name conflicts across syncs. This is the default.

No Policy

Hook resources are never deleted. You must clean them up manually. Can cause name conflicts.

Combining Waves and Hooks

# Full deployment pipeline using waves and hooks:

# PreSync, Wave -2: Run DB migration
# PreSync, Wave -1: Verify migration succeeded

# Sync, Wave 0: Deploy ConfigMaps, Secrets
# Sync, Wave 1: Deploy Deployments, Services
# Sync, Wave 2: Deploy Ingress, NetworkPolicies

# PostSync, Wave 0: Run smoke tests
# PostSync, Wave 1: Send Slack notification

Key insight: Waves work within each hook phase. You can have multiple waves within PreSync, multiple within Sync, and multiple within PostSync. This gives you fine-grained ordering control.

Quiz Time

Hooks Knowledge

1. Which hook type would you use for running database migrations?

Correct! PreSync is ideal for database migrations because they must complete successfully before the new application version is deployed.

2. What is the default hook delete policy?

Correct! BeforeHookCreation is the default. It deletes the previous hook resource before creating a new one on the next sync, preventing name conflicts.

3. What happens if a PreSync hook (Job) fails?

Correct! If a PreSync hook fails, the entire sync operation stops. The Sync phase never begins, protecting you from deploying an app that depends on a failed migration.

Retry Policies

Configure automatic retries when a sync fails due to transient errors.

spec:
  syncPolicy:
    retry:
      limit: 5              # Max retry attempts (0 = no retries)
      backoff:
        duration: 5s        # Initial delay between retries
        factor: 2           # Multiply delay by this factor each retry
        maxDuration: 3m     # Maximum delay between retries

# Retry schedule with these settings:
# Attempt 1: immediate
# Attempt 2: after 5s
# Attempt 3: after 10s
# Attempt 4: after 20s
# Attempt 5: after 40s (capped at 3m)

When to use: Transient errors like API server timeouts, webhook admission failures during rolling updates, or race conditions with CRDs being registered.

Replace vs Apply

# Per-resource annotation to use kubectl replace instead of apply
metadata:
  annotations:
    argocd.argoproj.io/sync-options: Replace=true

# Or force the resource (delete and recreate)
    argocd.argoproj.io/sync-options: Force=true

Apply (Default)

Three-way merge patch. Safe, preserves fields set by other controllers. Can fail on immutable fields.

Replace

Full resource replacement. Use when Apply fails on immutable field changes (e.g., Job spec changes).

The Skip Hook

metadata:
  annotations:
    argocd.argoproj.io/hook: Skip

The Skip annotation tells ArgoCD to completely ignore this resource during sync operations.

The resource will not be applied, updated, or pruned by ArgoCD
Useful for resources managed by external tools (e.g., Crossplane, external operators)
Also useful for template files that should exist in Git but not be deployed

Alternative: You can also exclude resources using the argocd.argoproj.io/compare-options: IgnoreExtraneous annotation to stop ArgoCD from comparing a resource.

Real-World Pipeline

Complete Deployment Pipeline

-2

PreSync: Database Migration

Job runs Flyway/Liquibase migrations. If it fails, deployment stops.

-1

PreSync: Migration Verification

Job verifies the schema is correct before proceeding.

0

Sync Wave 0: ConfigMaps & Secrets

Configuration deployed first so Deployments can mount them.

1

Sync Wave 1: Deployments & Services

Application pods roll out. ArgoCD waits for healthy before continuing.

2

PostSync: Smoke Tests & Notification

Verify the deployment works, then notify the team on Slack/Teams.

Sync Status Codes

Understanding what you see during and after a sync operation.

Succeeded: All resources synced, healthy, and hooks completed successfully
Failed: One or more resources failed to sync or a hook failed
Running: Sync is in progress -- resources are being applied
Pruned: Resources were deleted because they no longer exist in Git
OutOfSync: Sync needed -- live state differs from desired state

# View sync operation details
argocd app get my-app --show-operation

# View sync history
argocd app history my-app

Quiz Time

Final Review

1. Can sync waves be used within hook phases (e.g., multiple waves in PreSync)?

Correct! Waves work within each hook phase. You can have PreSync wave -2, PreSync wave -1, Sync wave 0, Sync wave 1, PostSync wave 0, etc.

2. When should you use `Replace=true` instead of the default Apply?

Correct! Replace is needed when you change immutable fields (like Job spec.template). Apply would fail because it tries to patch the resource, but Replace deletes and recreates it.

3. With retry backoff settings of duration=5s, factor=2, maxDuration=3m and limit=5, what is the delay before the 4th retry?

The backoff increases exponentially: 5s, 10s, 20s, 40s. The delay before the 4th attempt would be 20 seconds (5s x 2^2).

Summary

Module 03 Recap

Manual sync waits for approval; auto-sync deploys on Git changes
Self-heal reverts manual cluster changes within seconds
Prune deletes resources removed from Git; protect critical resources with per-resource annotations
Sync options fine-tune behavior: CreateNamespace, ServerSideApply, ApplyOutOfSyncOnly
Sync waves control resource ordering within each phase
Hooks: PreSync (migrations), Sync, PostSync (tests), SyncFail (alerts)
Delete policies: HookSucceeded, HookFailed, BeforeHookCreation (default)
Retry policies with exponential backoff handle transient failures

Next up: Module 04 -- ApplicationSets and Multi-Cluster

Module 03 Complete

You can now orchestrate complex deployment pipelines with ArgoCD

Continue to Module 04 for ApplicationSets and multi-cluster patterns

Sync Strategies & Hooks

The Midnight Incident

Manual vs Auto Sync

Manual Sync

Auto Sync

Enabling Auto-Sync

Self-Heal: ArgoCD Fights Back

Prune: Automatic Cleanup

Without Prune

With Prune

Sync Basics

1. What happens when auto-sync is enabled but prune is false, and you remove a Deployment from Git?

2. How quickly does ArgoCD's self-heal detect manual cluster changes?

3. What does allowEmpty: false protect against?

Sync Options

CreateNamespace

ServerSideApply

ApplyOutOfSyncOnly

Without (Default)

With ApplyOutOfSyncOnly

Per-Resource Sync Options

Sync Waves: Ordered Deployment

Using Sync Waves

Sync Options & Waves

1. What does the sync option ServerSideApply=true fix?

2. When using sync waves, when does ArgoCD proceed to the next wave?

3. How do you prevent a PVC from being pruned even when prune is globally enabled?

Resource Hooks

PreSync Hook

Sync Hook

PostSync Hook

SyncFail Hook

Hook Delete Policies

HookSucceeded

HookFailed

BeforeHookCreation

No Policy

Combining Waves and Hooks

Hooks Knowledge

1. Which hook type would you use for running database migrations?

2. What is the default hook delete policy?

3. What happens if a PreSync hook (Job) fails?

Retry Policies

Replace vs Apply

Apply (Default)

Replace

The Skip Hook

Complete Deployment Pipeline

PreSync: Database Migration

PreSync: Migration Verification

Sync Wave 0: ConfigMaps & Secrets

Sync Wave 1: Deployments & Services

PostSync: Smoke Tests & Notification

Sync Status Codes

Final Review

1. Can sync waves be used within hook phases (e.g., multiple waves in PreSync)?

2. When should you use Replace=true instead of the default Apply?

3. With retry backoff settings of duration=5s, factor=2, maxDuration=3m and limit=5, what is the delay before the 4th retry?

Module 03 Recap

Module 03 Complete

3. What does `allowEmpty: false` protect against?

1. What does the sync option `ServerSideApply=true` fix?

2. When should you use `Replace=true` instead of the default Apply?