GitOps Module — Presentation 4 of 4

Azure Integration & Operations

SSO, Monitoring, Notifications, and Production Operations

Civica Training Program

The Final Mile: Production-Ready GitOps

ArgoCD is installed and bootstrapped. Now let's make it enterprise-grade.

Where We Left Off

You have ArgoCD deploying applications across dev, staging, and prod using the App of Apps pattern. But your CISO asks: "Who has access? How do we audit? What happens if ArgoCD goes down? How do we get notified of failures?"

Identity

Azure AD SSO + RBAC

Observability

Monitoring, notifications

Operations

DR, multi-team, troubleshooting

Azure AD SSO for ArgoCD

Replace password auth with Azure AD single sign-on using OIDC.

User clicks "Login with Azure AD"

↓

Redirected to Azure AD login page

↓

User authenticates with MFA

↓

Azure AD returns JWT with group claims

↓

ArgoCD validates token and maps groups to roles

↓

User sees only the apps their team owns

Step 1: Azure AD App Registration

Create an App Registration in Azure AD for ArgoCD.

Azure Portal Steps

Go to Azure Active Directory → App registrations
Click New registration
Name: ArgoCD-SSO
Redirect URI: https://argocd.civica.internal/auth/callback
Note the Application (client) ID
Note the Directory (tenant) ID
Create a Client secret

Configure Group Claims

Go to Token configuration
Click Add groups claim
Select Security groups
For ID token, choose Group ID

Important: If you have more than 150 groups, Azure AD uses a group overage claim. Configure a filter or use Application Roles instead.

Step 2: Configure ArgoCD OIDC

Add the Azure AD OIDC configuration to ArgoCD's Helm values.

# values-production.yaml
configs:
  cm:
    url: https://argocd.civica.internal
    oidc.config: |
      name: Azure AD
      issuer: https://login.microsoftonline.com/TENANT_ID/v2.0
      clientID: CLIENT_ID
      clientSecret: $oidc.azure.clientSecret  # From argocd-secret
      requestedScopes:
        - openid
        - profile
        - email
      requestedIDTokenClaims:
        groups:
          essential: true

  secret:
    extra:
      oidc.azure.clientSecret: "YOUR_CLIENT_SECRET"  # Use External Secrets in prod!

Step 3: RBAC with Azure AD Groups

Map Azure AD security groups to ArgoCD roles and projects.

# values-production.yaml (continued)
configs:
  rbac:
    policy.csv: |
      # Platform team: full admin access
      g, "ad-group-id-platform-team", role:admin

      # Payments team: read + sync their own apps
      p, role:payments-team, applications, get, payments-team/*, allow
      p, role:payments-team, applications, sync, payments-team/*, allow
      p, role:payments-team, applications, action, payments-team/*, allow
      p, role:payments-team, logs, get, payments-team/*, allow
      g, "ad-group-id-payments-team", role:payments-team

      # Orders team: read + sync their own apps
      p, role:orders-team, applications, get, orders-team/*, allow
      p, role:orders-team, applications, sync, orders-team/*, allow
      g, "ad-group-id-orders-team", role:orders-team

      # Read-only for all authenticated users
      p, role:readonly, applications, get, */*, allow
      g, "ad-group-id-all-devs", role:readonly

    policy.default: ""  # No default access
    scopes: "[groups]"

ArgoCD RBAC Policy Syntax

Understanding the Casbin-based policy format.

Policy Rules (p)

# Format:
p, role, resource, action, object, effect

# Examples:
# Allow role to get apps in project
p, role:dev, applications, get, myproject/*, allow

# Allow role to sync specific app
p, role:dev, applications, sync, myproject/myapp, allow

# Deny delete for everyone
p, role:dev, applications, delete, */*, deny

Group Bindings (g)

# Format:
g, group-or-user, role

# Map Azure AD group to role
g, "azure-ad-group-object-id", role:dev

# Map specific user
g, "[email protected]", role:admin

Resources & Actions

`applications`	get, create, update, delete, sync, action
`repositories`	get, create, update, delete
`clusters`	get, create, update, delete
`logs`	get

Knowledge Check #1

Azure AD SSO and RBAC.

Q1: What protocol does ArgoCD use for Azure AD SSO?

Answer: B) OIDC (OpenID Connect). ArgoCD supports OIDC natively, which is the protocol Azure AD uses for modern authentication. The Azure AD App Registration provides the issuer, client ID, and client secret for OIDC configuration.

Q2: In ArgoCD RBAC, what does the "g" prefix mean in a policy line?

Answer: B) Group binding (maps a group or user to a role). In Casbin policy format, "p" defines permission rules and "g" defines group/role assignments. g, "ad-group-id", role:admin maps an Azure AD group to an ArgoCD role.

Q3: What should the default RBAC policy be for a secure ArgoCD installation?

Answer: C) Empty string (no default access). Setting policy.default: "" means users without an explicit role assignment get no access. This follows the principle of least privilege — access must be explicitly granted.

Azure Container Registry Integration

Ensuring AKS can pull images and ArgoCD can monitor image updates.

AKS + ACR Attachment

# Attach ACR to AKS (recommended)
az aks update \
  --resource-group myRG \
  --name myAKS \
  --attach-acr myACR

# This grants AKS kubelet identity
# the AcrPull role on the registry.
# No imagePullSecrets needed!

# Verify
az aks check-acr \
  --resource-group myRG \
  --name myAKS \
  --acr myACR.azurecr.io

ArgoCD Image Updater for ACR

# Install ArgoCD Image Updater
helm install argocd-image-updater \
  argo/argocd-image-updater \
  -n argocd

# Configure ACR access
# Use Workload Identity or
# create a Service Principal with AcrPull

# Annotate your Application:
annotations:
  argocd-image-updater.argoproj.io/
    image-list: app=myacr.azurecr.io/myapp
  argocd-image-updater.argoproj.io/
    app.update-strategy: semver
  argocd-image-updater.argoproj.io/
    write-back-method: git

Monitoring ArgoCD

ArgoCD exposes Prometheus metrics out of the box.

3

Metric Endpoints

argocd-server, argocd-repo-server, argocd-application-controller each expose /metrics

50+

Metrics

App sync status, health, git operations, reconciliation time, API requests, and more

0

Extra Config

Metrics are enabled by default. Just point Prometheus at the ArgoCD services.

Enable ServiceMonitor (for Prometheus Operator)

# values-production.yaml
server:
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
controller:
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
repoServer:
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true

Key ArgoCD Metrics to Monitor

The metrics that matter most for operational health.

Metric	What It Tells You	Alert Threshold
`argocd_app_info`	App sync status and health	sync_status != "Synced"
`argocd_app_reconcile_count`	Number of reconciliation attempts	High error rate
`argocd_app_sync_total`	Total sync operations by phase	phase = "Error"
`argocd_git_request_total`	Git fetch operations	High failure rate
`argocd_app_reconcile_duration`	Time to reconcile apps	> 5 minutes
`argocd_redis_request_total`	Redis cache operations	High error rate
`argocd_cluster_api_resource_objects`	Managed resource count	Unexpected drop

Grafana Dashboards for ArgoCD

Visualising ArgoCD health and performance.

Dashboard: Application Overview

Total applications by sync status
Health status breakdown (Healthy, Degraded, Missing)
Applications out of sync (urgent attention)
Sync operation success/failure rate
Average sync duration over time

Grafana Dashboard ID: 14584 (community)

Dashboard: Operational Health

Controller reconciliation queue depth
Repo-server Git clone latency
API server request rate and errors
Redis cache hit rate
Memory and CPU usage per component

Grafana Dashboard ID: 19993 (community)

Import community dashboards or build custom ones. The ArgoCD project provides a sample dashboard JSON in their docs.

Prometheus Alerting Rules

Alerts that will save you from midnight surprises.

# prometheus-rules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: argocd-alerts
spec:
  groups:
    - name: argocd
      rules:
        - alert: ArgoCDAppOutOfSync
          expr: argocd_app_info{sync_status!="Synced"} == 1
          for: 15m
          labels:
            severity: warning
          annotations:
            summary: "App {{ $labels.name }} is out of sync"

        - alert: ArgoCDAppUnhealthy
          expr: argocd_app_info{health_status!="Healthy",health_status!="Progressing"} == 1
          for: 10m
          labels:
            severity: critical
          annotations:
            summary: "App {{ $labels.name }} is {{ $labels.health_status }}"

        - alert: ArgoCDSyncFailed
          expr: increase(argocd_app_sync_total{phase="Error"}[1h]) > 3
          labels:
            severity: critical

ArgoCD Notifications

ArgoCD has a built-in notification system for Slack, Teams, email, and webhooks.

Supported Services

Slack — Channel messages and threads
Microsoft Teams — Incoming webhooks
Email — SMTP notifications
GitHub — Commit status and comments
Azure DevOps — Pipeline triggers
Webhook — Generic HTTP POST
PagerDuty — Incident creation

Notification Architecture

ArgoCD detects app state change

↓

Notification controller evaluates triggers

↓

Templates render the message

↓

Sends to configured services

Configuring Slack Notifications

Get notified in Slack when deployments succeed, fail, or drift.

# values-production.yaml
notifications:
  enabled: true
  secret:
    items:
      slack-token: xoxb-your-slack-bot-token

  notifiers:
    service.slack: |
      token: $slack-token

  templates:
    template.app-sync-succeeded: |
      slack:
        attachments: |
          [{
            "color": "#18be52",
            "title": "{{ .app.metadata.name }} synced successfully",
            "text": "Application {{ .app.metadata.name }} is now running revision {{ .app.status.sync.revision }}.",
            "fields": [{
              "title": "Project", "value": "{{ .app.spec.project }}", "short": true
            }]
          }]

  triggers:
    trigger.on-sync-succeeded: |
      - when: app.status.operationState.phase in ['Succeeded']
        send: [app-sync-succeeded]
    trigger.on-sync-failed: |
      - when: app.status.operationState.phase in ['Error', 'Failed']
        send: [app-sync-failed]

Microsoft Teams Notifications

For organisations using Teams as their primary communication platform.

# Configure Teams webhook
notifications:
  notifiers:
    service.teams: |
      recipientUrls:
        deployments-channel: https://outlook.office.com/webhook/xxx

  templates:
    template.app-deployed: |
      teams:
        title: "Deployment: {{ .app.metadata.name }}"
        text: |
          Application **{{ .app.metadata.name }}** has been synced.
          Revision: {{ .app.status.sync.revision | trunc 7 }}
          Status: {{ .app.status.health.status }}

  subscriptions:
    - recipients:
        - teams:deployments-channel
      triggers:
        - on-sync-succeeded
        - on-sync-failed
        - on-health-degraded

Per-app notifications: You can also annotate individual Applications to subscribe to specific notification channels, so each team gets only their relevant alerts.

Knowledge Check #2

Monitoring and notifications.

Q1: How does ArgoCD expose metrics for Prometheus?

Answer: B) Built-in /metrics endpoint on each component. ArgoCD server, repo-server, and application controller each expose a /metrics endpoint in Prometheus format. Enable ServiceMonitor for automatic discovery.

Q2: Which metric indicates an application is out of sync?

Answer: C) argocd_app_info{sync_status!="Synced"}. The argocd_app_info metric contains labels for sync_status and health_status. Filtering for sync_status != "Synced" shows applications that have drifted from their desired state in Git.

Q3: What are the three components of ArgoCD's notification system?

Answer: B) Services (notifiers), templates, and triggers. Services define where to send (Slack, Teams). Templates define what to send (message format). Triggers define when to send (on sync success, failure, health change).

Disaster Recovery

What happens if ArgoCD itself goes down? Planning for the worst.

The Good News

If ArgoCD stops working, your applications keep running. ArgoCD doesn't run your apps — it only manages their deployment. Existing workloads are unaffected.

What You Lose Without ArgoCD

Automatic sync from Git
Drift detection and correction
ArgoCD UI and status visibility
Notification pipeline

DR Strategy

Backup: Export all Application and AppProject resources
Git is your backup: All config is in Git — reinstall ArgoCD and repoint
Sealed Secrets key: Back up the controller private key!
Test recovery: Regularly test full restore in a DR cluster

Backup and Restore Procedures

Practical commands for backing up and restoring ArgoCD.

Backup

# Export all ArgoCD Applications
argocd admin export -n argocd > backup.yaml

# Or backup specific resources
kubectl get applications -n argocd -o yaml \
  > apps-backup.yaml
kubectl get appprojects -n argocd -o yaml \
  > projects-backup.yaml

# Backup Sealed Secrets key (critical!)
kubectl get secret -n kube-system \
  -l sealedsecrets.bitnami.com/sealed-secrets-key \
  -o yaml > sealed-secrets-key-backup.yaml

# Store backups in Azure Blob Storage
az storage blob upload \
  --container backup \
  --file backup.yaml

Restore

# 1. Reinstall ArgoCD (same Helm values)
helm install argocd argo/argo-cd \
  -n argocd -f values-production.yaml

# 2. Restore Sealed Secrets key
kubectl apply -f sealed-secrets-key-backup.yaml

# 3. Re-connect Git repos
argocd repo add [email protected]:civica/gitops-config \
  --ssh-private-key-path ./argocd-key

# 4. Apply the root App of Apps
kubectl apply -f argocd/apps.yaml

# ArgoCD will re-discover and sync
# everything from Git automatically!

Self-Managing ArgoCD

The ultimate GitOps pattern: ArgoCD managing its own configuration.

The Concept

Create an ArgoCD Application that points to ArgoCD's own Helm values in Git. When you update the values (e.g., add a new RBAC rule), ArgoCD upgrades itself.

# argocd/applications/argocd-self.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: argocd
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://argoproj.github.io/argo-helm
    chart: argo-cd
    targetRevision: 6.0.0
    helm:
      valueFiles:
        - $values/argocd/values-production.yaml
  sources:
    - repoURL: [email protected]:civica/gitops-config
      targetRevision: main
      ref: values
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      selfHeal: true

Multi-Team GitOps: Project Isolation

Ensuring teams can work independently without stepping on each other.

Team Isolation Model

ArgoCD Projects per team: payments-team, orders-team, platform-team
Namespace restrictions: Each team can only deploy to their namespaces
Source restrictions: Teams can only use approved Git repos
Resource restrictions: Teams can't create cluster-scoped resources
RBAC: Azure AD group mapped to team role

Project Definition

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: payments-team
  namespace: argocd
spec:
  sourceRepos:
    - "[email protected]:civica/gitops-config"
  destinations:
    - server: "*"
      namespace: "payments-dev"
    - server: "*"
      namespace: "payments-staging"
    - server: "*"
      namespace: "payments-prod"
  clusterResourceWhitelist: []
  roles:
    - name: team-lead
      policies:
        - p, proj:payments-team:team-lead, applications, *, payments-team/*, allow
      groups:
        - "ad-group-payments-leads"

Multi-Team Day-to-Day Workflow

How teams interact with the shared GitOps platform.

Action	Who	How
Add new microservice	Dev team	PR to add base + overlays in apps/ directory
Deploy new version	CI pipeline	Updates image tag in dev overlay, auto-syncs
Promote to staging	Dev team lead	PR to update staging overlay image tag
Promote to prod	Platform team	PR approval + manual sync in ArgoCD
Add infrastructure	Platform team	PR to infra/ directory, reviewed by SRE
Change RBAC	Platform team	PR to cluster/rbac/ and ArgoCD RBAC policy
Troubleshoot app	Dev team	ArgoCD UI (scoped to their project only)
Rollback	Dev team lead	`git revert` on the offending commit

Knowledge Check #3

Disaster recovery and multi-team operations.

Q1: What happens to running applications if ArgoCD goes down?

Answer: B) Applications keep running but lose auto-sync and drift detection. ArgoCD is a control plane for deployments, not a runtime dependency. Existing workloads continue unaffected. You lose the ability to deploy new changes and detect drift until ArgoCD is restored.

Q2: What makes self-managing ArgoCD possible?

Answer: B) An ArgoCD Application that points to ArgoCD's own Helm chart and values in Git. By creating an Application for ArgoCD itself, any changes to the Helm values in Git (like RBAC rules, SSO config) are automatically applied by ArgoCD.

Q3: What is the critical backup item that must NOT be lost for Sealed Secrets?

Answer: B) The Sealed Secrets controller's private key. Without the private key, no SealedSecret in your Git repo can be decrypted. You would need to re-encrypt all secrets with a new key pair. Back up this key securely (e.g., Azure Key Vault).

Troubleshooting Common GitOps Issues

The problems you'll actually encounter and how to fix them.

App Stuck "OutOfSync"

# Check what's different
argocd app diff my-app

# Common causes:
# - Mutating webhooks adding fields
# - Default values injected by K8s
# - Resource fields ignored by ArgoCD

# Fix: Add to ignoreDifferences
spec:
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas  # If HPA manages

Sync Failed

# Check sync operation details
argocd app get my-app

# Check events
kubectl get events -n my-namespace \
  --sort-by=.metadata.creationTimestamp

# Common causes:
# - RBAC: ArgoCD SA lacks permissions
# - Resource quota exceeded
# - Invalid manifest (schema error)
# - Namespace doesn't exist
# - Image pull failure (ACR auth)

More Troubleshooting Scenarios

Diving deeper into common operational issues.

Repo-Server Issues

# Repo clone failures
kubectl logs -n argocd \
  deploy/argocd-repo-server

# Common causes:
# - SSH host key changed
# - PAT expired
# - Network policy blocking egress
# - Repo too large (increase resources)

# Fix SSH host key:
argocd cert add-ssh \
  --batch github.com

Performance Issues

# Controller reconciliation slow
# Check queue depth:
kubectl exec -n argocd \
  deploy/argocd-application-controller \
  -- argocd admin settings resource-overrides

# Solutions:
# - Increase controller resources
# - Reduce reconciliation timeout
# - Use resource tracking (annotation)
# - Split into multiple ArgoCD instances
# - Exclude non-essential resource types

Pro tip: Enable ArgoCD server debug logging temporarily with --loglevel debug to diagnose complex issues. Remember to revert after troubleshooting.

GitOps Best Practices

Lessons learned from running GitOps at scale.

Repository Practices

Separate app code repos from GitOps config repos
Use Kustomize overlays, not branch-per-environment
Pin image tags (never use :latest)
Require PR reviews for all config changes
Use CODEOWNERS files to enforce team reviews
Keep manifests DRY with Kustomize bases

Operational Practices

Use manual sync for production (PR = gate)
Set up notifications for sync failures
Monitor ArgoCD with Prometheus + Grafana
Back up Sealed Secrets keys to Key Vault
Test disaster recovery quarterly
Use ArgoCD Projects for team isolation

GitOps Anti-Patterns

Common mistakes to avoid on your GitOps journey.

Anti-Pattern	Why It's Bad	What to Do Instead
Plain secrets in Git	Anyone with repo access can read them	Sealed Secrets or External Secrets
Using `:latest` image tag	No audit trail, unpredictable rollouts	Pin to specific tags (semver or SHA)
Manual kubectl in production	Drift, no audit trail, breaks GitOps loop	All changes through Git PRs only
Branch-per-environment	Merge conflicts, diverging configs	Directory-per-environment (overlays)
Auto-sync everything in prod	Risky — no human gate for critical changes	Manual sync + PR approval for prod
One giant monolithic app	Blast radius too large, slow syncs	App of Apps with granular child apps
Ignoring drift alerts	Erodes trust in GitOps as source of truth	Investigate and resolve every drift event
No RBAC on ArgoCD	Everyone can sync/delete any app	Projects + Azure AD RBAC from day one

The Complete GitOps Architecture

Everything we've built across all four presentations.

Developer Workflow

Push code to app repo

CI: build, test, push image to ACR

CI: update image tag in GitOps repo

GitOps Platform

ArgoCD detects Git change

Syncs cluster to desired state

Monitors drift continuously

Operations

Azure AD SSO + RBAC

Prometheus + Grafana monitoring

Slack/Teams notifications

Git is the source of truth. ArgoCD is the engine. Azure is the platform. Your team is in control.

GitOps Module: Complete Recap

Everything we've covered across all four presentations.

Presentation 1: Fundamentals

GitOps principles and benefits
Push vs Pull model
ArgoCD as our tool of choice
Repository structure best practices

Presentation 2: Installation

Helm-based ArgoCD installation
HA configuration
Ingress and TLS exposure
Git repo authentication

Presentation 3: Bootstrapping

Kustomize base + overlays
App of Apps and ApplicationSets
Sealed Secrets and External Secrets
Environment promotion workflow

Presentation 4: Operations

Azure AD SSO and RBAC
Prometheus monitoring and Grafana
Slack/Teams notifications
DR, multi-team, best practices

What's Next for Your Team

Practical next steps after completing this module.

Immediate Actions (Week 1)

Install ArgoCD on your dev AKS cluster
Create the GitOps repo with recommended structure
Deploy the guestbook example app
Set up Azure AD SSO
Configure Slack/Teams notifications

Gradual Rollout (Weeks 2-4)

Migrate one service to GitOps (dev only)
Set up External Secrets with Key Vault
Implement App of Apps pattern
Add Prometheus monitoring for ArgoCD
Promote to staging, then production

Start small. One service, one environment. Build confidence, then expand. GitOps is a journey, not a big bang.

Module 4 Summary

Key takeaways from Azure Integration and Operations.

Identity

Azure AD OIDC for SSO. Map AD groups to ArgoCD roles. Disable admin account.

Monitoring

Built-in Prometheus metrics. ServiceMonitors for auto-discovery. Key alerts for sync failures.

Notifications

Built-in system with Slack, Teams, email. Triggers, templates, services pattern.

DR

Git is the backup. Back up Sealed Secrets keys. Test recovery regularly.

Multi-Team

ArgoCD Projects for isolation. RBAC per team. Namespace-scoped access.

Practices

Pin images, manual sync for prod, no secrets in Git, investigate every drift.

Questions & Discussion

Azure Integration & Operations

Congratulations on completing the GitOps Module!

Civica Training Program GitOps Module — 4 of 4