Alert Triage Playbook
Structured triage workflow for Microsoft Defender for Endpoint alerts. Severity classification, decision trees, investigation queries, enrichment procedures, and closure documentation standards.
1 Alert Severity Classification
Microsoft Defender for Endpoint assigns severity levels to alerts based on the potential impact and confidence of the detection. The following table defines the SOC triage expectations for each severity tier.
| Severity | Description | Triage SLA | Action |
|---|---|---|---|
| High | Alerts associated with APTs, credential theft, or ransomware | 15 min | Immediate investigation |
| Medium | EDR-detected suspicious behavior indicating possible compromise | 1 hour | Prioritized investigation |
| Low | Prevalent malware or hack tools not indicating APT | 4 hours | Scheduled investigation |
| Informational | Not considered harmful but may indicate security awareness items | 24 hours | Review and classify |
Triage SLAs are measured from the time the alert appears in the MDE queue to the time an analyst begins active investigation. Automated investigation results from MDE should be reviewed as part of the initial triage, not as a substitute for it.
2 MDE Alert Categories
MDE alert categories map directly to threat behaviors. The following table provides the mapping between MDE alert categories and MITRE ATT&CK tactics along with associated technique identifiers for reference during triage.
| MDE Alert Category | MITRE ATT&CK Tactic | Key Techniques |
|---|---|---|
| Ransomware | Impact | T1486 |
| Malware | Execution | Multiple |
| Phishing | Initial Access | T1566 |
| Credential Access | Credential Access | T1003 T1558 |
| Command and Control | Command and Control | T1071 T1095 |
| Lateral Movement | Lateral Movement | T1021 T1570 |
| Persistence | Persistence | T1547 T1053 |
| Defense Evasion | Defense Evasion | T1055 T1036 |
| Exfiltration | Exfiltration | T1041 T1567 |
| Discovery | Discovery | T1087 T1082 |
| Privilege Escalation | Privilege Escalation | T1068 T1134 |
| Execution | Execution | T1059 T1204 |
| Suspicious Activity | Various | Multiple |
| Unwanted Software | Impact | Various |
3 Triage Decision Tree
Follow this structured decision tree for every incoming MDE alert. Each step ensures consistent triage outcomes across the analyst team.
-
ALERT RECEIVED — Begin Triage Workflow
-
Step 1: Is the alert a duplicate or known false positive?
-
Close as Duplicate/Known FP. Document the reason in the alert closure notes.
-
Continue to Step 2
-
Step 2: Check alert context in MDE (Review device timeline, alert story, automated investigation results)
-
Step 3: Is the activity from a sanctioned tool/process?
-
Verify with asset inventory/CMDB
-
Confirmed sanctioned: Close as Benign True Positive (BTP). Consider adding a suppression rule.
-
Not in inventory: Escalate for review by senior analyst or tool owner.
-
-
Continue to Step 4
-
Step 4: Does the alert correlate with known threat intelligence?
-
Escalate as True Positive. Create Incident and assign to IR team.
-
Continue to Step 5
-
Step 5: Run enrichment queries (see Section 6: Enrichment Workflows)
-
Step 6: Based on enrichment results, classify the alert:
-
True Positive (TP): Create incident, assign severity, begin incident response procedures.
-
False Positive (FP): Close alert, document rationale, consider detection tuning.
-
Benign True Positive (BTP): Close alert, document expected behavior, consider suppression rule.
-
-
-
-
-
-
-
-
-
-
4 Initial Investigation KQL Queries
The following KQL queries support the triage workflow. Run these in Microsoft 365 Defender Advanced Hunting to gather context around an alert entity.
Device Timeline Around Alert
Retrieves all process events within a configurable time window around the alert timestamp on the target device. Provides immediate context for what was executing before, during, and after the alert fired.
let alertTime = datetime(2024-01-15T10:30:00Z);
let deviceName = "WORKSTATION-01";
let timeWindow = 30m;
DeviceProcessEvents
| where Timestamp between ((alertTime - timeWindow) .. (alertTime + timeWindow))
| where DeviceName == deviceName
| project Timestamp, FileName, ProcessCommandLine, InitiatingProcessFileName, AccountName
| sort by Timestamp asc
Expected Output
Returns all process events within ±30 minutes of the alert. Columns include timestamp, file name, command line, parent process, and the user account that launched the process.
When to Use
First step in any alert investigation. Establishes the timeline of activity surrounding the alert event to identify related suspicious processes.
Alert Process Tree Analysis
Joins alert evidence with device process events to reconstruct the full process execution chain associated with the alert. Reveals parent-child relationships and command line arguments.
AlertEvidence
| where AlertId == "your-alert-id"
| where EntityType == "Process"
| project DeviceId, FileName, ProcessCommandLine, ParentProcessName = InitiatingProcessFileName
| join kind=leftouter (
DeviceProcessEvents
| project DeviceId, FileName, ProcessCommandLine, SHA256,
ParentProcess = InitiatingProcessFileName, ParentCommandLine = InitiatingProcessCommandLine
) on DeviceId, FileName
| project-away DeviceId1, FileName1
Expected Output
Process execution chain from alert evidence, including file names, command lines, SHA256 hashes, and parent process details for each process in the chain.
When to Use
Understanding the attack chain. Use this query when you need to trace how a suspicious process was launched and what it spawned.
Lateral Movement Check for Alert Entity
Identifies all devices a suspected compromised account has authenticated to within the last 24 hours. Summarizes logon types and device spread by hourly buckets to surface lateral movement patterns.
let suspiciousAccount = "compromised_user";
let timeRange = 24h;
DeviceLogonEvents
| where Timestamp > ago(timeRange)
| where AccountName == suspiciousAccount
| where LogonType in ("RemoteInteractive", "Network", "Batch")
| summarize LogonCount = count(),
Devices = make_set(DeviceName),
LogonTypes = make_set(LogonType)
by AccountName, bin(Timestamp, 1h)
| sort by Timestamp desc
Expected Output
Shows all devices the account authenticated to, grouped by hour. Includes logon count, distinct device names, and logon types used in each time bucket.
When to Use
When account compromise is suspected. Run this query to determine the blast radius of a potentially compromised identity and identify lateral movement across the environment.
File Hash Prevalence Check
Checks how many devices in the environment have seen a specific file hash. Low prevalence files are more likely to be targeted or malicious, while high prevalence suggests a legitimate application.
let targetHash = "abc123...";
DeviceFileEvents
| where SHA256 == targetHash
| summarize DeviceCount = dcount(DeviceName),
Devices = make_set(DeviceName),
FirstSeen = min(Timestamp),
LastSeen = max(Timestamp)
| extend PrevalenceAssessment = iff(DeviceCount < 3, "Rare - Investigate",
iff(DeviceCount < 10, "Uncommon - Review", "Common - Likely Legitimate"))
Expected Output
How many devices have the file, a list of those device names, first and last seen timestamps, and a prevalence assessment label (Rare, Uncommon, or Common).
When to Use
Determining if a file is targeted or widespread. A file present on only 1-2 devices is far more suspicious than one present on hundreds.
Network IOC Fleet-Wide Check
Searches the entire fleet for connections to a suspicious IP address or domain over the last 7 days. Used to scope the breadth of impact when a network-based indicator of compromise is identified.
let suspiciousIP = "203.0.113.50";
let suspiciousDomain = "malicious-domain.com";
DeviceNetworkEvents
| where Timestamp > ago(7d)
| where RemoteIP == suspiciousIP or RemoteUrl has suspiciousDomain
| summarize ConnectionCount = count(),
Devices = make_set(DeviceName),
Ports = make_set(RemotePort),
FirstConnection = min(Timestamp),
LastConnection = max(Timestamp)
by RemoteIP, RemoteUrl
| sort by ConnectionCount desc
Expected Output
All devices connecting to the suspicious IP or domain, including connection count, distinct devices, ports used, and the time window of communication.
When to Use
Scoping impact of network-based IOCs. Run this when a C2 domain or malicious IP is identified to determine how many endpoints have been communicating with the threat infrastructure.
Parent Process Chain Analysis
Traces the full parent-child-grandparent process chain for a suspicious process on a specific device. Reveals the execution origin, which is critical for determining whether the process was launched via a legitimate mechanism or through an exploit chain.
let targetDevice = "WORKSTATION-01";
let targetProcess = "suspicious.exe";
DeviceProcessEvents
| where DeviceName == targetDevice
| where FileName == targetProcess or InitiatingProcessFileName == targetProcess
| project Timestamp, FileName, ProcessCommandLine,
Parent = InitiatingProcessFileName,
ParentCmd = InitiatingProcessCommandLine,
GrandParent = InitiatingProcessParentFileName
| sort by Timestamp asc
Expected Output
Full parent-child process chain including timestamps, file names, command lines for the process, its parent, and its grandparent process.
When to Use
Understanding how the suspicious process was launched. For example, if powershell.exe was spawned by winword.exe, that strongly suggests a malicious document execution chain.
Prior Alerts for Entity
Queries the last 30 days of alert history for a specific entity (device, user, or file). Identifies repeat offenders and can reveal ongoing campaigns or persistent threats targeting the same assets.
let entityName = "WORKSTATION-01";
AlertInfo
| where Timestamp > ago(30d)
| join kind=inner (
AlertEvidence
| where EntityType in ("Machine", "User", "File")
| where DeviceName == entityName or AccountName == entityName or FileName == entityName
) on AlertId
| summarize AlertCount = count(),
AlertTitles = make_set(Title),
Severities = make_set(Severity),
Categories = make_set(Category)
by DeviceName, AccountName
| sort by AlertCount desc
Expected Output
Shows the alert history for the entity over the past 30 days, including total alert count, distinct alert titles, severity levels, and categories grouped by device and account.
When to Use
Identifying repeat offenders or ongoing campaigns. A device with multiple high-severity alerts over a short period likely indicates an active compromise rather than isolated events.
5 Evidence Collection Procedures
When an alert is confirmed as a True Positive or requires further investigation, the following evidence must be collected and preserved before any containment actions are taken.
All collected artifacts must follow the standard naming format: [IncidentID]_[DeviceName]_[ArtifactType]_[YYYYMMDD_HHMMSS]
Examples: INC-2024-0142_WKS01_Timeline_20240115_103000 or INC-2024-0142_WKS01_InvestigationPackage_20240115_110000
6 Enrichment Workflows
Enrichment adds external and internal context to alert entities (files, IPs, domains, users). Follow this procedure for every alert that progresses past Step 4 in the Triage Decision Tree.
Enrichment Procedure Steps
- Check file hash on VirusTotal (manual upload or API query). Record detection ratio, first seen date, and any sandbox behavioral reports.
- Check domain/IP reputation on VirusTotal, AbuseIPDB, and Shodan. Record abuse confidence score, geolocation, ASN, and any associated malware campaigns.
- Perform WHOIS lookup for suspicious domains. Note registration date, registrar, nameservers, and registrant information. Recently registered domains (<30 days) are high risk.
- Query internal threat intelligence database/platform. Cross-reference IOCs with any active threat hunts, previous incidents, or known adversary infrastructure tracked by the team.
- Check Microsoft Threat Intelligence for the entity. Review the MDE file profile, URL reputation, and any associated global alert prevalence data within the Microsoft ecosystem.
- Verify device compliance status in MDE. Check whether the affected device has missing patches, disabled security features, or policy violations that may have contributed to the alert.
- Check user risk score in Azure AD Identity Protection. Review sign-in risk events, impossible travel detections, and any existing risk flags on the user account associated with the alert.
Enrichment Decision Matrix
| Entity Type | Primary Source | Secondary Sources | Key Data Points |
|---|---|---|---|
| File Hash | VirusTotal | MDE File Profile, Any.Run | Detection ratio, first/last seen, sandbox behavior |
| IP Address | AbuseIPDB | VirusTotal, Shodan, GreyNoise | Abuse confidence, geolocation, open ports, ISP |
| Domain | VirusTotal | WHOIS, URLhaus, MDE TI | Registration date, registrar, DNS history, reputation |
| URL | VirusTotal | URLhaus, Google Safe Browsing | Redirection chain, hosted content, blocklist status |
| User Account | Azure AD | MDE User Profile, UEBA | Risk level, recent activity, sign-in anomalies |
7 Alert Closure Documentation
Every alert must be closed with proper documentation to maintain audit trails, enable metrics reporting, and support detection tuning. The following fields are required or recommended for each alert closure.
Required Closure Fields
| Field | Required | Description |
|---|---|---|
| Classification | Yes | TP (True Positive), FP (False Positive), or BTP (Benign True Positive) |
| Determination | Yes | Malware, SecurityTesting, LineOfBusinessApplication, UnwantedSoftware, or Other |
| Analyst Notes | Yes | Minimum 2 sentences explaining the classification rationale, including key evidence reviewed and conclusions drawn |
| Actions Taken | Yes | List of all response actions performed (e.g., isolated device, blocked hash, disabled account, no action required) |
| Tuning Action | If FP | Recommended detection tuning or suppression rule to prevent recurrence of the false positive |
| Linked Incident | If TP | Incident ID for the created incident. All TP alerts must be associated with an incident for tracking and reporting |
| Enrichment Summary | Recommended | Key findings from enrichment sources (VirusTotal scores, reputation data, TI matches) that supported the classification |
| Time Spent | Recommended | Minutes spent on triage for workload metrics and SLA reporting. Helps identify detection rules that consume disproportionate analyst time |
All P1/P2 alert closures must be reviewed by a senior analyst within 24 hours. FP classifications exceeding 5 per week for the same detection rule trigger an automatic tuning review.