Lab 05: Azure Monitor, Alerts, Backup, and Log Analytics

Estimated time: 50–65 minutes Difficulty: ⭐⭐⭐☆☆ Environment: Azure free account — Azure CLI + portal


Prerequisites

az account show --query "{name:name, id:id}" -o table

Lab Objectives

  1. Create a Log Analytics workspace
  2. Enable diagnostic settings on a VM to stream logs and metrics
  3. Write and run KQL queries
  4. Create a metric alert with an email action group
  5. Configure VM backup in a Recovery Services vault
  6. Perform a file-level restore from a backup

Step 1: Create Resource Group and Supporting Resources

RG="rg-az104-lab05"
LOCATION="eastus"
az group create --name $RG --location $LOCATION

# Create Log Analytics Workspace
az monitor log-analytics workspace create \
  --resource-group $RG \
  --workspace-name "law-lab05" \
  --location $LOCATION \
  --retention-time 30

LAW_ID=$(az monitor log-analytics workspace show \
  --resource-group $RG \
  --workspace-name "law-lab05" \
  --query id -o tsv)
echo "Log Analytics Workspace ID: $LAW_ID"

Step 2: Deploy a VM to Monitor

az vm create \
  --resource-group $RG \
  --name "vm-monitor" \
  --image Ubuntu2204 \
  --admin-username azureuser \
  --admin-password "P@ssw0rd!Azure104" \
  --size Standard_B2s \
  --location $LOCATION

VM_ID=$(az vm show \
  --resource-group $RG \
  --name "vm-monitor" \
  --query id -o tsv)
echo "VM ID: $VM_ID"

Step 3: Enable Diagnostic Settings

Route VM metrics and logs to the Log Analytics workspace:

az monitor diagnostic-settings create \
  --name "diag-vm-to-law" \
  --resource "$VM_ID" \
  --workspace "$LAW_ID" \
  --metrics '[{"category": "AllMetrics", "enabled": true, "retentionPolicy": {"enabled": false, "days": 0}}]'

Install the Azure Monitor Agent (AMA) on the VM to collect OS-level performance counters:

az vm extension set \
  --resource-group $RG \
  --vm-name "vm-monitor" \
  --name AzureMonitorLinuxAgent \
  --publisher Microsoft.Azure.Monitor \
  --version 1.0 \
  --enable-auto-upgrade true

⚠️ Tricky spot: Diagnostic Settings alone send Azure platform metrics (host-level CPU, network). To get OS-level metrics (memory, disk free space, custom app counters), you also need the Azure Monitor Agent installed inside the VM. These are separate data streams.


Step 4: Create an Action Group for Alerts

# Create an action group with email notification
az monitor action-group create \
  --resource-group $RG \
  --name "ag-lab05" \
  --short-name "lab05" \
  --action email "lab-admin" "your-email@example.com"

AG_ID=$(az monitor action-group show \
  --resource-group $RG \
  --name "ag-lab05" \
  --query id -o tsv)
echo "Action Group ID: $AG_ID"

Step 5: Create Metric Alerts

# Alert: CPU > 85% for 5 minutes
az monitor metrics alert create \
  --name "alert-high-cpu" \
  --resource-group $RG \
  --scopes "$VM_ID" \
  --condition "avg Percentage CPU > 85" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --action "$AG_ID" \
  --severity 2 \
  --description "CPU exceeded 85% for 5 minutes"

# Alert: Available memory < 500 MB
az monitor metrics alert create \
  --name "alert-low-memory" \
  --resource-group $RG \
  --scopes "$VM_ID" \
  --condition "avg Available Memory Bytes < 524288000" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --action "$AG_ID" \
  --severity 3 \
  --description "Available memory below 500 MB"

# Verify alerts
az monitor metrics alert list \
  --resource-group $RG \
  --query "[].{name:name, severity:severity, condition:criteria.allOf[0].metricName}" \
  -o table

⚠️ Tricky spot: Alert --window-size is the evaluation window (how long the condition must be true). --evaluation-frequency is how often Azure checks. window-size >= evaluation-frequency is required. If frequency is 1m and window is 5m, Azure checks every 1 minute whether the 5-minute average exceeded the threshold.


Step 6: Create an Activity Log Alert

Activity log alerts fire on Azure management operations:

SUBSCRIPTION_ID=$(az account show --query id -o tsv)

# Alert when any VM in the subscription is deleted
az monitor activity-log alert create \
  --name "alert-vm-deleted" \
  --resource-group $RG \
  --scope "/subscriptions/$SUBSCRIPTION_ID" \
  --condition "category=Administrative and operationName=Microsoft.Compute/virtualMachines/delete and status=Succeeded" \
  --action-group "$AG_ID" \
  --description "Alert when a VM is deleted anywhere in the subscription"

⚠️ Tricky spot: Activity log alerts use scope at the subscription level (or RG level) — not the individual resource. Metric alerts scope to individual resources or resource groups.


Step 7: Query Logs with KQL

Wait 10–15 minutes for some activity data to flow into the workspace, then:

# Get the workspace name and resource group for portal KQL
echo "Open portal → Log Analytics Workspaces → law-lab05 → Logs"

Run these KQL queries in the Log Analytics workspace portal UI:

// Query 1: Recent Azure activity in this resource group
AzureActivity
| where TimeGenerated > ago(1h)
| where ResourceGroup == "rg-az104-lab05"
| project TimeGenerated, OperationNameValue, ActivityStatusValue, Caller
| order by TimeGenerated desc
// Query 2: Count operations by type
AzureActivity
| where TimeGenerated > ago(24h)
| summarize count() by OperationNameValue
| order by count_ desc
| top 10 by count_
// Query 3: VM heartbeat check
Heartbeat
| where TimeGenerated > ago(30m)
| summarize LastHeartbeat = max(TimeGenerated) by Computer

⚠️ Tricky spot: Data may take 5–10 minutes to appear in Log Analytics after enabling diagnostic settings. If queries return empty results, wait and retry.


Step 8: Configure VM Backup

# Create a Recovery Services Vault
az backup vault create \
  --resource-group $RG \
  --name "rsv-lab05" \
  --location $LOCATION

# View available backup policies
az backup policy list \
  --resource-group $RG \
  --vault-name "rsv-lab05" \
  --query "[].name" \
  -o tsv

# Enable backup for the VM using DefaultPolicy
az backup protection enable-for-vm \
  --resource-group $RG \
  --vault-name "rsv-lab05" \
  --vm "vm-monitor" \
  --policy-name "DefaultPolicy"

# Trigger an immediate backup (don't wait for scheduled)
az backup protection backup-now \
  --resource-group $RG \
  --vault-name "rsv-lab05" \
  --container-name "IaasVMContainer;iaasvmcontainerv2;rg-az104-lab05;vm-monitor" \
  --item-name "VM;iaasvmcontainerv2;rg-az104-lab05;vm-monitor" \
  --backup-management-type AzureIaasVM \
  --retain-until "01-01-2026"

Wait for the backup job to complete:

# Monitor backup job status
az backup job list \
  --resource-group $RG \
  --vault-name "rsv-lab05" \
  --query "[0].{jobId:name, status:properties.status, operation:properties.operation}" \
  -o table

Step 9: List Recovery Points

# List available recovery points (run after backup completes)
az backup recoverypoint list \
  --resource-group $RG \
  --vault-name "rsv-lab05" \
  --container-name "IaasVMContainer;iaasvmcontainerv2;rg-az104-lab05;vm-monitor" \
  --item-name "VM;iaasvmcontainerv2;rg-az104-lab05;vm-monitor" \
  --backup-management-type AzureIaasVM \
  --query "[].{name:name, time:properties.recoveryPointTime, type:properties.recoveryPointType}" \
  -o table

⚠️ Tricky spot: Container name and item name use semicolon-separated compound identifiers. The format is: IaasVMContainer;iaasvmcontainerv2;<resource-group>;<vm-name>. Getting these wrong is the #1 reason backup CLI commands fail. You can get the exact values with az backup item list.


Step 10: Clean Up

# Disable backup protection first (required before deleting vault)
az backup protection disable \
  --resource-group $RG \
  --vault-name "rsv-lab05" \
  --container-name "IaasVMContainer;iaasvmcontainerv2;rg-az104-lab05;vm-monitor" \
  --item-name "VM;iaasvmcontainerv2;rg-az104-lab05;vm-monitor" \
  --backup-management-type AzureIaasVM \
  --delete-backup-data true \
  --yes

# Then delete resource group
az group delete --name $RG --yes --no-wait

⚠️ Tricky spot: You cannot delete a Recovery Services vault if it has active backup items or ASR replication. Always disable protection (with --delete-backup-data true) before deleting the vault.


Lab Tricky Spots Summary

TrapEffectFix
Diagnostic Settings without AMAOnly host-level metrics; no OS metrics (memory, disk)Install Azure Monitor Agent for full OS telemetry
Log data delayKQL queries return empty immediately after setupWait 5–10 minutes for data ingestion
window-size < evaluation-frequencyAlert creation failsSet window-size ≥ evaluation-frequency
Wrong backup container/item name formatBackup CLI commands failUse az backup item list to get exact names
Deleting vault with active itemsVault deletion blockedDisable protection with --delete-backup-data true first

Lab Takeaways

  1. Azure Monitor has two data types — metrics (automatic, 93 days) and logs (require workspace, 30 days default). Know which one you're querying.
  2. Three alert types — metric (threshold on numeric value), log query (KQL result), activity log (Azure management operation). Match the right alert type to the scenario.
  3. Recovery Services vault is required for VM backup and ASR. The vault must be in the same region as the VM.
  4. Backup soft delete means deleted backup data persists for 14 days. You cannot delete the vault until all backup data is purged.
  5. Test your DR — ASR test failover and backup restore drills should be scheduled regularly, not just done once.