Operations Guide¶
Bootstrap¶
1. Prerequisites¶
- Terraform
>= 1.6 - Azure CLI authenticated to the target tenant
- Permissions to create resources: network, private endpoint, policy, monitoring, storage
2. Set Up Remote State Backend¶
Before running Terraform, create a storage account for remote state:
# Create resource group for state
az group create --name rg-tfstate --location eastus
# Create storage account (name must be globally unique)
az storage account create \
--name sttzstatedev \
--resource-group rg-tfstate \
--sku Standard_LRS \
--encryption-services blob \
--min-tls-version TLS1_2 \
--allow-blob-public-access false
# Create container
az storage container create \
--name tfstate \
--account-name sttzstatedev
Repeat for each environment with distinct storage account names (e.g., sttzstatestaging, sttzstateprod), or use a single account with separate containers.
3. Configure Backend¶
Copy the backend example and update with your storage account details:
4. Configure Variables¶
cp envs/dev/terraform.tfvars.example envs/dev/terraform.tfvars
# Edit terraform.tfvars with your client name, region, CIDRs, etc.
5. Initialize and Deploy¶
Validation Commands¶
./scripts/preflight.sh— Check prerequisites and Azure context./scripts/validate.sh— Format check, init, validate across all environments./scripts/drift_check.sh <env> -var-file=terraform.tfvars— Detect configuration drift
Environment Isolation¶
Each environment maintains:
- Separate Terraform state file
- Separate backend configuration
- Separate workspace or subscription
- No cross-environment data plane references
CI/CD Pipeline¶
The repository includes a GitHub Actions workflow (.github/workflows/terraform.yml) that:
- Validates and format-checks on every PR
- Plans all three environments on PR
- Applies sequentially on merge to main: dev -> staging -> prod
- Each environment requires approval via GitHub Environments
Required Secrets per Environment¶
| Secret | Purpose |
|---|---|
AZURE_CLIENT_ID |
Service principal / workload identity client ID |
AZURE_TENANT_ID |
Microsoft Entra tenant ID |
AZURE_SUBSCRIPTION_ID |
Target subscription ID |
BACKEND_RESOURCE_GROUP |
State storage resource group |
BACKEND_STORAGE_ACCOUNT |
State storage account name |
BACKEND_CONTAINER |
State storage container name |
Day-2 Tasks¶
- Review Key Vault RBAC assignments
- Tune diagnostic category retention and budget alerts
- Expand policy coverage based on tenant standards
- Review and update model deployment quotas as usage grows
- Monitor Azure Policy compliance reports for drift
Observability¶
Every deployed service emits diagnostics to Log Analytics:
- Log retention is configurable via
log_retention_days - Diagnostics fan-out covers OpenAI, Key Vault, AI Search, and the active data profile
- Support for export to SIEM via Log Analytics integration
Incident Readiness¶
- Use Log Analytics workspaces in each env for scoped triage
- Add Azure Monitor alerts for auth failures, service health, and cost thresholds
- Review policy compliance reports for drift detection
Upgrade Promotion¶
Changes are promoted through environments: dev -> staging -> prod. See Upgrade Strategy for the full process.