Skip to content

Operations Guide

Bootstrap

1. Prerequisites

  • Terraform >= 1.6
  • Azure CLI authenticated to the target tenant
  • Permissions to create resources: network, private endpoint, policy, monitoring, storage

2. Set Up Remote State Backend

Before running Terraform, create a storage account for remote state:

# Create resource group for state
az group create --name rg-tfstate --location eastus

# Create storage account (name must be globally unique)
az storage account create \
  --name sttzstatedev \
  --resource-group rg-tfstate \
  --sku Standard_LRS \
  --encryption-services blob \
  --min-tls-version TLS1_2 \
  --allow-blob-public-access false

# Create container
az storage container create \
  --name tfstate \
  --account-name sttzstatedev

Repeat for each environment with distinct storage account names (e.g., sttzstatestaging, sttzstateprod), or use a single account with separate containers.

3. Configure Backend

Copy the backend example and update with your storage account details:

cp envs/dev/backend.tf.example envs/dev/backend.tf
# Edit backend.tf with your storage account name

4. Configure Variables

cp envs/dev/terraform.tfvars.example envs/dev/terraform.tfvars
# Edit terraform.tfvars with your client name, region, CIDRs, etc.

5. Initialize and Deploy

cd envs/dev
terraform init
terraform plan
terraform apply

Validation Commands

  • ./scripts/preflight.sh — Check prerequisites and Azure context
  • ./scripts/validate.sh — Format check, init, validate across all environments
  • ./scripts/drift_check.sh <env> -var-file=terraform.tfvars — Detect configuration drift

Environment Isolation

Each environment maintains:

  • Separate Terraform state file
  • Separate backend configuration
  • Separate workspace or subscription
  • No cross-environment data plane references

CI/CD Pipeline

The repository includes a GitHub Actions workflow (.github/workflows/terraform.yml) that:

  • Validates and format-checks on every PR
  • Plans all three environments on PR
  • Applies sequentially on merge to main: dev -> staging -> prod
  • Each environment requires approval via GitHub Environments

Required Secrets per Environment

Secret Purpose
AZURE_CLIENT_ID Service principal / workload identity client ID
AZURE_TENANT_ID Microsoft Entra tenant ID
AZURE_SUBSCRIPTION_ID Target subscription ID
BACKEND_RESOURCE_GROUP State storage resource group
BACKEND_STORAGE_ACCOUNT State storage account name
BACKEND_CONTAINER State storage container name

Day-2 Tasks

  • Review Key Vault RBAC assignments
  • Tune diagnostic category retention and budget alerts
  • Expand policy coverage based on tenant standards
  • Review and update model deployment quotas as usage grows
  • Monitor Azure Policy compliance reports for drift

Observability

Every deployed service emits diagnostics to Log Analytics:

  • Log retention is configurable via log_retention_days
  • Diagnostics fan-out covers OpenAI, Key Vault, AI Search, and the active data profile
  • Support for export to SIEM via Log Analytics integration

Incident Readiness

  • Use Log Analytics workspaces in each env for scoped triage
  • Add Azure Monitor alerts for auth failures, service health, and cost thresholds
  • Review policy compliance reports for drift detection

Upgrade Promotion

Changes are promoted through environments: dev -> staging -> prod. See Upgrade Strategy for the full process.