Domain 5: Monitor and Maintain Azure Resources

Exam weight: 10–15%

This domain is about knowing that things are working, responding when they're not, backing up data, and planning for disaster recovery. It's more conceptual than the networking domain, but you'll see scenario questions comparing the tools.


5.1 Azure Monitor

Azure Monitor is the central hub for all monitoring in Azure. It collects metrics and logs from almost every Azure resource automatically.

Two Core Data Types

TypeWhat it isDefault retentionStorage
MetricsNumerical time-series data (CPU %, bytes/sec)93 daysAzure Monitor Metrics Store
LogsText/structured records of events30 days (configurable up to 730 days)Log Analytics workspace (required)

Exam trap: Metrics are stored by default for 93 days in Azure Monitor automatically — no workspace needed. Logs require a Log Analytics workspace and must be explicitly routed there via Diagnostic Settings.

Diagnostic Settings

Diagnostic settings control where resource telemetry flows. Every resource that supports monitoring lets you configure:

  • Platform Metrics → Log Analytics workspace (for querying metrics with KQL)
  • Resource Logs (e.g., Activity Logs, resource-specific logs) → Log Analytics workspace, Storage Account, Event Hub, or Partner solution
# Enable diagnostic settings: send activity log to a workspace
az monitor diagnostic-settings create \
  --name "diag-vm-logs" \
  --resource "/subscriptions/.../resourceGroups/rg/providers/Microsoft.Compute/virtualMachines/myvm" \
  --workspace "/subscriptions/.../resourceGroups/rg/providers/Microsoft.OperationalInsights/workspaces/my-workspace" \
  --logs '[{"category": "Administrative", "enabled": true}]' \
  --metrics '[{"category": "AllMetrics", "enabled": true}]'

5.2 Log Analytics

Log Analytics is the query engine for Azure Monitor logs. Queries use KQL (Kusto Query Language).

Creating a Workspace

az monitor log-analytics workspace create \
  --resource-group rg-monitor \
  --workspace-name "law-prod" \
  --location eastus \
  --retention-time 90

Essential KQL Patterns

// Count errors in the last hour
AzureActivity
| where TimeGenerated > ago(1h)
| where ActivityStatusValue == "Failed"
| summarize count() by OperationNameValue

// Top 10 VMs by CPU
Perf
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| where TimeGenerated > ago(30m)
| summarize avg(CounterValue) by Computer
| top 10 by avg_CounterValue desc

// Storage account write errors
StorageBlobLogs
| where StatusCode >= 400
| project TimeGenerated, OperationName, StatusCode, Uri

Key KQL Operators (exam-relevant)

OperatorWhat it does
whereFilter rows
summarizeAggregate (count, avg, sum, max)
projectSelect specific columns
order by / sort bySort results
top N byReturn top N rows
extendAdd computed column
joinJoin two tables
ago()Time relative to now (ago(1h), ago(7d))

5.3 Alerts

Alerts notify you (or trigger automated actions) when specific conditions are met.

Alert Rule Components

ComponentDescription
ScopeWhich resource(s) to monitor
ConditionSignal + threshold (e.g., CPU > 90%)
Action groupWho/what gets notified
Alert ruleBrings scope + condition + action group together

Alert Signal Types

Signal typeSourceExample
MetricReal-time numeric valueCPU > 80% for 5 min
Log queryKQL query resultCount of errors > 10 in 15 min
Activity logAzure control-plane operationsVM deallocated, resource deleted
Resource healthAzure-side platform issuesVM unavailable
Service healthAzure-wide incidents/maintenanceRegion outage

Exam tip: "Notify me when someone deletes a resource group" = Activity log alert. "Notify me when CPU exceeds 90%" = Metric alert. "Notify me when more than 5 failed logins appear in logs" = Log query alert.

Action Groups

Action groups define what happens when an alert fires:

Action typeUse case
Email/SMSNotify an on-call engineer
Azure FunctionRun custom logic/automation
Logic AppComplex automated workflows
WebhookIntegrate with third-party systems (PagerDuty, Slack)
ITSMCreate an incident in ServiceNow
Automation RunbookExecute an Azure Automation runbook
# Create an action group
az monitor action-group create \
  --resource-group rg-monitor \
  --name "ag-ops-team" \
  --short-name "ops" \
  --action email oncall oncall@contoso.com

# Create a metric alert
az monitor metrics alert create \
  --name "alert-high-cpu" \
  --resource-group rg-monitor \
  --scopes "/subscriptions/.../resourceGroups/rg/providers/Microsoft.Compute/virtualMachines/myvm" \
  --condition "avg Percentage CPU > 90" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --action "/subscriptions/.../resourceGroups/rg-monitor/providers/microsoft.insights/actionGroups/ag-ops-team" \
  --severity 2

5.4 Azure Backup

Azure Backup is Microsoft's managed backup service. It protects VMs, SQL databases, Azure Files, blobs, and more.

Recovery Services Vault

The Recovery Services Vault is the central container for backup data and backup policies. It's required for VM backup and Azure Site Recovery.

Exam trap: There is also a Backup vault (newer) — used specifically for Azure Disk backup, Azure Database for PostgreSQL, and Azure Blob backup. Recovery Services Vault is used for VM backup and ASR.

VM Backup

  • Backup is crash-consistent by default; for VMs running SQL or VSS-aware apps, it can be application-consistent.
  • Backup policy defines frequency (daily) and retention (daily, weekly, monthly, yearly).
  • Soft delete: Deleted backup data is retained for 14 additional days (default; configurable). This prevents accidental data loss.
# Enable backup on a VM (vault must exist)
az backup protection enable-for-vm \
  --vault-name "rsv-prod" \
  --resource-group rg-backup \
  --vm "/subscriptions/.../resourceGroups/rg/providers/Microsoft.Compute/virtualMachines/myvm" \
  --policy-name "DefaultPolicy"

# List backup items
az backup item list \
  --vault-name "rsv-prod" \
  --resource-group rg-backup \
  --backup-management-type AzureIaasVM \
  -o table

# Trigger an on-demand backup
az backup protection backup-now \
  --vault-name "rsv-prod" \
  --resource-group rg-backup \
  --item-name "vm;iaasvmcontainer;rg;myvm" \
  --container-name "iaasvmcontainer;iaasvmcontainerv2;rg;myvm" \
  --backup-management-type AzureIaasVM \
  --retain-until "31-12-2025"

Restore Options

OptionDescription
Create new VMRestore full VM from a recovery point
Replace existing diskReplace OS or data disk on an existing VM
Restore filesMount the recovery point as a drive and copy individual files

5.5 Azure Site Recovery (ASR)

Azure Site Recovery provides disaster recovery (DR) — it replicates VMs continuously to a secondary region. In the event of a regional outage, you can fail over to the replicated VMs.

Key Concepts

TermDefinition
RPO (Recovery Point Objective)How much data you can afford to lose (e.g., 15 minutes of data)
RTO (Recovery Time Objective)How long you can be offline before failover completes
ReplicationContinuous block-level replication of VM disks to the target region
Test failoverValidates DR without affecting production (spins up in isolated network)
FailoverActivates the DR environment as production
FailbackReplicates back to primary region and shifts production back

ASR RPO

ASR replicates continuously and maintains recovery points. Default RPO is 15 minutes (crash-consistent) or up to 4 hours (app-consistent, configurable).

Exam trap: Backup and ASR serve different purposes:

  • Azure Backup = protect against accidental deletion, data corruption, ransomware
  • ASR = protect against regional outage, datacenter failure (DR scenario)

5.6 Azure Update Manager

Azure Update Manager (formerly Update Management Center) manages OS updates across Azure VMs and Arc-connected on-premises servers.

  • Provides a unified view of update compliance across all machines
  • Supports scheduled assessment and scheduled patching
  • Maintenance windows control when updates are applied
  • Works without a Log Analytics agent (uses Azure VM extension)

5.7 Azure Advisor

Azure Advisor analyzes your Azure usage and provides personalized recommendations across five categories:

CategoryExamples
CostRight-size underutilized VMs, delete unused resources
SecurityEnable MFA, apply security patches
ReliabilityAdd availability zones, configure backups
Operational ExcellenceEnable diagnostics, follow best practices
PerformanceUpgrade VM disks, increase throughput

Exam tip: Advisor is read-only and advisory — it doesn't make changes. It surfaces recommendations; you act on them.


Section Takeaways

TopicKey Point
Metrics93-day default retention; no workspace needed
LogsRequire Log Analytics workspace; 30-day default
Diagnostic settingsRoute logs and metrics to workspace, storage, Event Hub
Activity log alertWhen Azure management actions trigger notifications
Metric alertWhen numeric threshold is crossed
Log query alertWhen KQL query result meets a condition
Recovery Services VaultRequired for VM backup and ASR
Backup vaultUsed for Disk backup, PostgreSQL, Blob backup
ASR RPODefault 15 minutes (crash-consistent)
Soft deleteDeleted backup data retained 14 days
Backup vs ASRBackup = data protection; ASR = regional DR

Confusing Points — Clarified

Q: What's the difference between Backup vault and Recovery Services vault? A: Recovery Services vault is the original (VM backup, SQL backup, ASR). Backup vault is newer and used for a specific set of newer workloads (Azure Disk backup, Blobs, PostgreSQL). For the AZ-104 exam, Recovery Services vault is what you configure for VM backup and ASR.

Q: Can I query metrics data with KQL? A: Yes, but you need to first route platform metrics to a Log Analytics workspace via Diagnostic Settings. Once there, you query the Perf table (VM metrics) or the AzureMetrics table. By default, metrics are only in the Metrics Store (queryable via Metrics Explorer, not Log Analytics).

Q: What's the difference between Test Failover and Failover in ASR? A: Test Failover spins up the replicated VM in an isolated virtual network — production is not affected, replication continues. Failover is the real event — production shifts to the DR site. Always test before a real event.

Q: Does Azure Backup require internet connectivity from the VM? A: Azure Backup uses the Azure Backup extension inside the VM and sends backup data to the Recovery Services Vault. By default, this requires internet access (or service endpoints/private endpoints for the vault). You can configure private endpoints for the vault to eliminate internet traffic.