GitOps Module — Presentation 4 of 4

Azure Integration & Operations

SSO, Monitoring, Notifications, and Production Operations

Civica Training Program

The Final Mile: Production-Ready GitOps

ArgoCD is installed and bootstrapped. Now let's make it enterprise-grade.

Where We Left Off

You have ArgoCD deploying applications across dev, staging, and prod using the App of Apps pattern. But your CISO asks: "Who has access? How do we audit? What happens if ArgoCD goes down? How do we get notified of failures?"

Identity

Azure AD SSO + RBAC

Observability

Monitoring, notifications

Operations

DR, multi-team, troubleshooting

Azure AD SSO for ArgoCD

Replace password auth with Azure AD single sign-on using OIDC.

User clicks "Login with Azure AD"
Redirected to Azure AD login page
User authenticates with MFA
Azure AD returns JWT with group claims
ArgoCD validates token and maps groups to roles
User sees only the apps their team owns

Step 1: Azure AD App Registration

Create an App Registration in Azure AD for ArgoCD.

Azure Portal Steps

  1. Go to Azure Active DirectoryApp registrations
  2. Click New registration
  3. Name: ArgoCD-SSO
  4. Redirect URI: https://argocd.civica.internal/auth/callback
  5. Note the Application (client) ID
  6. Note the Directory (tenant) ID
  7. Create a Client secret

Configure Group Claims

  1. Go to Token configuration
  2. Click Add groups claim
  3. Select Security groups
  4. For ID token, choose Group ID

Important: If you have more than 150 groups, Azure AD uses a group overage claim. Configure a filter or use Application Roles instead.

Step 2: Configure ArgoCD OIDC

Add the Azure AD OIDC configuration to ArgoCD's Helm values.

# values-production.yaml
configs:
  cm:
    url: https://argocd.civica.internal
    oidc.config: |
      name: Azure AD
      issuer: https://login.microsoftonline.com/TENANT_ID/v2.0
      clientID: CLIENT_ID
      clientSecret: $oidc.azure.clientSecret  # From argocd-secret
      requestedScopes:
        - openid
        - profile
        - email
      requestedIDTokenClaims:
        groups:
          essential: true

  secret:
    extra:
      oidc.azure.clientSecret: "YOUR_CLIENT_SECRET"  # Use External Secrets in prod!

Step 3: RBAC with Azure AD Groups

Map Azure AD security groups to ArgoCD roles and projects.

# values-production.yaml (continued)
configs:
  rbac:
    policy.csv: |
      # Platform team: full admin access
      g, "ad-group-id-platform-team", role:admin

      # Payments team: read + sync their own apps
      p, role:payments-team, applications, get, payments-team/*, allow
      p, role:payments-team, applications, sync, payments-team/*, allow
      p, role:payments-team, applications, action, payments-team/*, allow
      p, role:payments-team, logs, get, payments-team/*, allow
      g, "ad-group-id-payments-team", role:payments-team

      # Orders team: read + sync their own apps
      p, role:orders-team, applications, get, orders-team/*, allow
      p, role:orders-team, applications, sync, orders-team/*, allow
      g, "ad-group-id-orders-team", role:orders-team

      # Read-only for all authenticated users
      p, role:readonly, applications, get, */*, allow
      g, "ad-group-id-all-devs", role:readonly

    policy.default: ""  # No default access
    scopes: "[groups]"

ArgoCD RBAC Policy Syntax

Understanding the Casbin-based policy format.

Policy Rules (p)

# Format:
p, role, resource, action, object, effect

# Examples:
# Allow role to get apps in project
p, role:dev, applications, get, myproject/*, allow

# Allow role to sync specific app
p, role:dev, applications, sync, myproject/myapp, allow

# Deny delete for everyone
p, role:dev, applications, delete, */*, deny

Group Bindings (g)

# Format:
g, group-or-user, role

# Map Azure AD group to role
g, "azure-ad-group-object-id", role:dev

# Map specific user
g, "[email protected]", role:admin

Resources & Actions

applicationsget, create, update, delete, sync, action
repositoriesget, create, update, delete
clustersget, create, update, delete
logsget

Knowledge Check #1

Azure AD SSO and RBAC.

Q1: What protocol does ArgoCD use for Azure AD SSO?

Answer: B) OIDC (OpenID Connect). ArgoCD supports OIDC natively, which is the protocol Azure AD uses for modern authentication. The Azure AD App Registration provides the issuer, client ID, and client secret for OIDC configuration.

Q2: In ArgoCD RBAC, what does the "g" prefix mean in a policy line?

Answer: B) Group binding (maps a group or user to a role). In Casbin policy format, "p" defines permission rules and "g" defines group/role assignments. g, "ad-group-id", role:admin maps an Azure AD group to an ArgoCD role.

Q3: What should the default RBAC policy be for a secure ArgoCD installation?

Answer: C) Empty string (no default access). Setting policy.default: "" means users without an explicit role assignment get no access. This follows the principle of least privilege — access must be explicitly granted.

Azure Container Registry Integration

Ensuring AKS can pull images and ArgoCD can monitor image updates.

AKS + ACR Attachment

# Attach ACR to AKS (recommended)
az aks update \
  --resource-group myRG \
  --name myAKS \
  --attach-acr myACR

# This grants AKS kubelet identity
# the AcrPull role on the registry.
# No imagePullSecrets needed!

# Verify
az aks check-acr \
  --resource-group myRG \
  --name myAKS \
  --acr myACR.azurecr.io

ArgoCD Image Updater for ACR

# Install ArgoCD Image Updater
helm install argocd-image-updater \
  argo/argocd-image-updater \
  -n argocd

# Configure ACR access
# Use Workload Identity or
# create a Service Principal with AcrPull

# Annotate your Application:
annotations:
  argocd-image-updater.argoproj.io/
    image-list: app=myacr.azurecr.io/myapp
  argocd-image-updater.argoproj.io/
    app.update-strategy: semver
  argocd-image-updater.argoproj.io/
    write-back-method: git

Monitoring ArgoCD

ArgoCD exposes Prometheus metrics out of the box.

3
Metric Endpoints

argocd-server, argocd-repo-server, argocd-application-controller each expose /metrics

50+
Metrics

App sync status, health, git operations, reconciliation time, API requests, and more

0
Extra Config

Metrics are enabled by default. Just point Prometheus at the ArgoCD services.

Enable ServiceMonitor (for Prometheus Operator)

# values-production.yaml
server:
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
controller:
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
repoServer:
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true

Key ArgoCD Metrics to Monitor

The metrics that matter most for operational health.

MetricWhat It Tells YouAlert Threshold
argocd_app_infoApp sync status and healthsync_status != "Synced"
argocd_app_reconcile_countNumber of reconciliation attemptsHigh error rate
argocd_app_sync_totalTotal sync operations by phasephase = "Error"
argocd_git_request_totalGit fetch operationsHigh failure rate
argocd_app_reconcile_durationTime to reconcile apps> 5 minutes
argocd_redis_request_totalRedis cache operationsHigh error rate
argocd_cluster_api_resource_objectsManaged resource countUnexpected drop

Grafana Dashboards for ArgoCD

Visualising ArgoCD health and performance.

Dashboard: Application Overview

  • Total applications by sync status
  • Health status breakdown (Healthy, Degraded, Missing)
  • Applications out of sync (urgent attention)
  • Sync operation success/failure rate
  • Average sync duration over time

Grafana Dashboard ID: 14584 (community)

Dashboard: Operational Health

  • Controller reconciliation queue depth
  • Repo-server Git clone latency
  • API server request rate and errors
  • Redis cache hit rate
  • Memory and CPU usage per component

Grafana Dashboard ID: 19993 (community)

Import community dashboards or build custom ones. The ArgoCD project provides a sample dashboard JSON in their docs.

Prometheus Alerting Rules

Alerts that will save you from midnight surprises.

# prometheus-rules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: argocd-alerts
spec:
  groups:
    - name: argocd
      rules:
        - alert: ArgoCDAppOutOfSync
          expr: argocd_app_info{sync_status!="Synced"} == 1
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "App {{ $labels.name }} is out of sync"

        - alert: ArgoCDAppUnhealthy
          expr: argocd_app_info{health_status!="Healthy",health_status!="Progressing"} == 1
          for: 10m
          labels:
            severity: critical
          annotations:
            summary: "App {{ $labels.name }} is {{ $labels.health_status }}"

        - alert: ArgoCDSyncFailed
          expr: increase(argocd_app_sync_total{phase="Error"}[1h]) > 3
          labels:
            severity: critical

ArgoCD Notifications

ArgoCD has a built-in notification system for Slack, Teams, email, and webhooks.

Supported Services

  • Slack — Channel messages and threads
  • Microsoft Teams — Incoming webhooks
  • Email — SMTP notifications
  • GitHub — Commit status and comments
  • Azure DevOps — Pipeline triggers
  • Webhook — Generic HTTP POST
  • PagerDuty — Incident creation

Notification Architecture

ArgoCD detects app state change
Notification controller evaluates triggers
Templates render the message
Sends to configured services

Configuring Slack Notifications

Get notified in Slack when deployments succeed, fail, or drift.

# values-production.yaml
notifications:
  enabled: true
  secret:
    items:
      slack-token: xoxb-your-slack-bot-token

  notifiers:
    service.slack: |
      token: $slack-token

  templates:
    template.app-sync-succeeded: |
      slack:
        attachments: |
          [{
            "color": "#18be52",
            "title": "{{ .app.metadata.name }} synced successfully",
            "text": "Application {{ .app.metadata.name }} is now running revision {{ .app.status.sync.revision }}.",
            "fields": [{
              "title": "Project", "value": "{{ .app.spec.project }}", "short": true
            }]
          }]

  triggers:
    trigger.on-sync-succeeded: |
      - when: app.status.operationState.phase in ['Succeeded']
        send: [app-sync-succeeded]
    trigger.on-sync-failed: |
      - when: app.status.operationState.phase in ['Error', 'Failed']
        send: [app-sync-failed]

Microsoft Teams Notifications

For organisations using Teams as their primary communication platform.

# Configure Teams webhook
notifications:
  notifiers:
    service.teams: |
      recipientUrls:
        deployments-channel: https://outlook.office.com/webhook/xxx

  templates:
    template.app-deployed: |
      teams:
        title: "Deployment: {{ .app.metadata.name }}"
        text: |
          Application **{{ .app.metadata.name }}** has been synced.
          Revision: {{ .app.status.sync.revision | trunc 7 }}
          Status: {{ .app.status.health.status }}

  subscriptions:
    - recipients:
        - teams:deployments-channel
      triggers:
        - on-sync-succeeded
        - on-sync-failed
        - on-health-degraded

Per-app notifications: You can also annotate individual Applications to subscribe to specific notification channels, so each team gets only their relevant alerts.

Knowledge Check #2

Monitoring and notifications.

Q1: How does ArgoCD expose metrics for Prometheus?

Answer: B) Built-in /metrics endpoint on each component. ArgoCD server, repo-server, and application controller each expose a /metrics endpoint in Prometheus format. Enable ServiceMonitor for automatic discovery.

Q2: Which metric indicates an application is out of sync?

Answer: C) argocd_app_info{sync_status!="Synced"}. The argocd_app_info metric contains labels for sync_status and health_status. Filtering for sync_status != "Synced" shows applications that have drifted from their desired state in Git.

Q3: What are the three components of ArgoCD's notification system?

Answer: B) Services (notifiers), templates, and triggers. Services define where to send (Slack, Teams). Templates define what to send (message format). Triggers define when to send (on sync success, failure, health change).

Disaster Recovery

What happens if ArgoCD itself goes down? Planning for the worst.

The Good News

If ArgoCD stops working, your applications keep running. ArgoCD doesn't run your apps — it only manages their deployment. Existing workloads are unaffected.

What You Lose Without ArgoCD

  • Automatic sync from Git
  • Drift detection and correction
  • ArgoCD UI and status visibility
  • Notification pipeline

DR Strategy

  • Backup: Export all Application and AppProject resources
  • Git is your backup: All config is in Git — reinstall ArgoCD and repoint
  • Sealed Secrets key: Back up the controller private key!
  • Test recovery: Regularly test full restore in a DR cluster

Backup and Restore Procedures

Practical commands for backing up and restoring ArgoCD.

Backup

# Export all ArgoCD Applications
argocd admin export -n argocd > backup.yaml

# Or backup specific resources
kubectl get applications -n argocd -o yaml \
  > apps-backup.yaml
kubectl get appprojects -n argocd -o yaml \
  > projects-backup.yaml

# Backup Sealed Secrets key (critical!)
kubectl get secret -n kube-system \
  -l sealedsecrets.bitnami.com/sealed-secrets-key \
  -o yaml > sealed-secrets-key-backup.yaml

# Store backups in Azure Blob Storage
az storage blob upload \
  --container backup \
  --file backup.yaml

Restore

# 1. Reinstall ArgoCD (same Helm values)
helm install argocd argo/argo-cd \
  -n argocd -f values-production.yaml

# 2. Restore Sealed Secrets key
kubectl apply -f sealed-secrets-key-backup.yaml

# 3. Re-connect Git repos
argocd repo add [email protected]:civica/gitops-config \
  --ssh-private-key-path ./argocd-key

# 4. Apply the root App of Apps
kubectl apply -f argocd/apps.yaml

# ArgoCD will re-discover and sync
# everything from Git automatically!

Self-Managing ArgoCD

The ultimate GitOps pattern: ArgoCD managing its own configuration.

The Concept

Create an ArgoCD Application that points to ArgoCD's own Helm values in Git. When you update the values (e.g., add a new RBAC rule), ArgoCD upgrades itself.

# argocd/applications/argocd-self.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: argocd
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://argoproj.github.io/argo-helm
    chart: argo-cd
    targetRevision: 6.0.0
    helm:
      valueFiles:
        - $values/argocd/values-production.yaml
  sources:
    - repoURL: [email protected]:civica/gitops-config
      targetRevision: main
      ref: values
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      selfHeal: true

Multi-Team GitOps: Project Isolation

Ensuring teams can work independently without stepping on each other.

Team Isolation Model

  • ArgoCD Projects per team: payments-team, orders-team, platform-team
  • Namespace restrictions: Each team can only deploy to their namespaces
  • Source restrictions: Teams can only use approved Git repos
  • Resource restrictions: Teams can't create cluster-scoped resources
  • RBAC: Azure AD group mapped to team role

Project Definition

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: payments-team
  namespace: argocd
spec:
  sourceRepos:
    - "[email protected]:civica/gitops-config"
  destinations:
    - server: "*"
      namespace: "payments-dev"
    - server: "*"
      namespace: "payments-staging"
    - server: "*"
      namespace: "payments-prod"
  clusterResourceWhitelist: []
  roles:
    - name: team-lead
      policies:
        - p, proj:payments-team:team-lead, applications, *, payments-team/*, allow
      groups:
        - "ad-group-payments-leads"

Multi-Team Day-to-Day Workflow

How teams interact with the shared GitOps platform.

ActionWhoHow
Add new microserviceDev teamPR to add base + overlays in apps/ directory
Deploy new versionCI pipelineUpdates image tag in dev overlay, auto-syncs
Promote to stagingDev team leadPR to update staging overlay image tag
Promote to prodPlatform teamPR approval + manual sync in ArgoCD
Add infrastructurePlatform teamPR to infra/ directory, reviewed by SRE
Change RBACPlatform teamPR to cluster/rbac/ and ArgoCD RBAC policy
Troubleshoot appDev teamArgoCD UI (scoped to their project only)
RollbackDev team leadgit revert on the offending commit

Knowledge Check #3

Disaster recovery and multi-team operations.

Q1: What happens to running applications if ArgoCD goes down?

Answer: B) Applications keep running but lose auto-sync and drift detection. ArgoCD is a control plane for deployments, not a runtime dependency. Existing workloads continue unaffected. You lose the ability to deploy new changes and detect drift until ArgoCD is restored.

Q2: What makes self-managing ArgoCD possible?

Answer: B) An ArgoCD Application that points to ArgoCD's own Helm chart and values in Git. By creating an Application for ArgoCD itself, any changes to the Helm values in Git (like RBAC rules, SSO config) are automatically applied by ArgoCD.

Q3: What is the critical backup item that must NOT be lost for Sealed Secrets?

Answer: B) The Sealed Secrets controller's private key. Without the private key, no SealedSecret in your Git repo can be decrypted. You would need to re-encrypt all secrets with a new key pair. Back up this key securely (e.g., Azure Key Vault).

Troubleshooting Common GitOps Issues

The problems you'll actually encounter and how to fix them.

App Stuck "OutOfSync"

# Check what's different
argocd app diff my-app

# Common causes:
# - Mutating webhooks adding fields
# - Default values injected by K8s
# - Resource fields ignored by ArgoCD

# Fix: Add to ignoreDifferences
spec:
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas  # If HPA manages

Sync Failed

# Check sync operation details
argocd app get my-app

# Check events
kubectl get events -n my-namespace \
  --sort-by=.metadata.creationTimestamp

# Common causes:
# - RBAC: ArgoCD SA lacks permissions
# - Resource quota exceeded
# - Invalid manifest (schema error)
# - Namespace doesn't exist
# - Image pull failure (ACR auth)

More Troubleshooting Scenarios

Diving deeper into common operational issues.

Repo-Server Issues

# Repo clone failures
kubectl logs -n argocd \
  deploy/argocd-repo-server

# Common causes:
# - SSH host key changed
# - PAT expired
# - Network policy blocking egress
# - Repo too large (increase resources)

# Fix SSH host key:
argocd cert add-ssh \
  --batch github.com

Performance Issues

# Controller reconciliation slow
# Check queue depth:
kubectl exec -n argocd \
  deploy/argocd-application-controller \
  -- argocd admin settings resource-overrides

# Solutions:
# - Increase controller resources
# - Reduce reconciliation timeout
# - Use resource tracking (annotation)
# - Split into multiple ArgoCD instances
# - Exclude non-essential resource types

Pro tip: Enable ArgoCD server debug logging temporarily with --loglevel debug to diagnose complex issues. Remember to revert after troubleshooting.

GitOps Best Practices

Lessons learned from running GitOps at scale.

Repository Practices

  • Separate app code repos from GitOps config repos
  • Use Kustomize overlays, not branch-per-environment
  • Pin image tags (never use :latest)
  • Require PR reviews for all config changes
  • Use CODEOWNERS files to enforce team reviews
  • Keep manifests DRY with Kustomize bases

Operational Practices

  • Use manual sync for production (PR = gate)
  • Set up notifications for sync failures
  • Monitor ArgoCD with Prometheus + Grafana
  • Back up Sealed Secrets keys to Key Vault
  • Test disaster recovery quarterly
  • Use ArgoCD Projects for team isolation

GitOps Anti-Patterns

Common mistakes to avoid on your GitOps journey.

Anti-PatternWhy It's BadWhat to Do Instead
Plain secrets in GitAnyone with repo access can read themSealed Secrets or External Secrets
Using :latest image tagNo audit trail, unpredictable rolloutsPin to specific tags (semver or SHA)
Manual kubectl in productionDrift, no audit trail, breaks GitOps loopAll changes through Git PRs only
Branch-per-environmentMerge conflicts, diverging configsDirectory-per-environment (overlays)
Auto-sync everything in prodRisky — no human gate for critical changesManual sync + PR approval for prod
One giant monolithic appBlast radius too large, slow syncsApp of Apps with granular child apps
Ignoring drift alertsErodes trust in GitOps as source of truthInvestigate and resolve every drift event
No RBAC on ArgoCDEveryone can sync/delete any appProjects + Azure AD RBAC from day one

The Complete GitOps Architecture

Everything we've built across all four presentations.

Developer Workflow

Push code to app repo
CI: build, test, push image to ACR
CI: update image tag in GitOps repo

GitOps Platform

ArgoCD detects Git change
Syncs cluster to desired state
Monitors drift continuously

Operations

Azure AD SSO + RBAC
Prometheus + Grafana monitoring
Slack/Teams notifications

Git is the source of truth. ArgoCD is the engine. Azure is the platform. Your team is in control.

GitOps Module: Complete Recap

Everything we've covered across all four presentations.

Presentation 1: Fundamentals

  • GitOps principles and benefits
  • Push vs Pull model
  • ArgoCD as our tool of choice
  • Repository structure best practices

Presentation 2: Installation

  • Helm-based ArgoCD installation
  • HA configuration
  • Ingress and TLS exposure
  • Git repo authentication

Presentation 3: Bootstrapping

  • Kustomize base + overlays
  • App of Apps and ApplicationSets
  • Sealed Secrets and External Secrets
  • Environment promotion workflow

Presentation 4: Operations

  • Azure AD SSO and RBAC
  • Prometheus monitoring and Grafana
  • Slack/Teams notifications
  • DR, multi-team, best practices

What's Next for Your Team

Practical next steps after completing this module.

Immediate Actions (Week 1)

  1. Install ArgoCD on your dev AKS cluster
  2. Create the GitOps repo with recommended structure
  3. Deploy the guestbook example app
  4. Set up Azure AD SSO
  5. Configure Slack/Teams notifications

Gradual Rollout (Weeks 2-4)

  1. Migrate one service to GitOps (dev only)
  2. Set up External Secrets with Key Vault
  3. Implement App of Apps pattern
  4. Add Prometheus monitoring for ArgoCD
  5. Promote to staging, then production

Start small. One service, one environment. Build confidence, then expand. GitOps is a journey, not a big bang.

Module 4 Summary

Key takeaways from Azure Integration and Operations.

Identity

Azure AD OIDC for SSO. Map AD groups to ArgoCD roles. Disable admin account.

Monitoring

Built-in Prometheus metrics. ServiceMonitors for auto-discovery. Key alerts for sync failures.

Notifications

Built-in system with Slack, Teams, email. Triggers, templates, services pattern.

DR

Git is the backup. Back up Sealed Secrets keys. Test recovery regularly.

Multi-Team

ArgoCD Projects for isolation. RBAC per team. Namespace-scoped access.

Practices

Pin images, manual sync for prod, no secrets in Git, investigate every drift.

Questions & Discussion

Azure Integration & Operations

Congratulations on completing the GitOps Module!

Civica Training Program GitOps Module — 4 of 4
1 / 32
← Back