Incident Management
Incident classification, lifecycle management, communication templates, escalation matrices, and RACI framework. Aligned with ITIL and NIST frameworks.
1 Incident Classification
Incidents are classified using a priority matrix based on impact and urgency. This classification drives response SLAs, update cadences, and escalation requirements. All SOC analysts must apply this matrix consistently at the point of incident declaration.
| Priority | Impact | Urgency | Response SLA | Update Frequency | Examples |
|---|---|---|---|---|---|
| P1 - Critical | Organization-wide | Immediate | 15 min | Every 30 min | Active ransomware, confirmed data breach, nation-state compromise |
| P2 - High | Multiple departments | High | 1 hour | Every 2 hours | Lateral movement detected, C2 confirmed, privilege escalation |
| P3 - Medium | Single department | Medium | 4 hours | Every 8 hours | Malware on single endpoint, phishing with credential harvest, policy violation |
| P4 - Low | Single user | Low | 24 hours | Daily | Adware, PUP detection, low-confidence alerts, informational findings |
Category Taxonomy
Each incident must be assigned a category and sub-category at the point of declaration. This taxonomy supports reporting, trend analysis, and playbook routing.
| Category | Sub-Categories | Typical Priority |
|---|---|---|
| Malware | Ransomware, Trojan, Worm, Dropper, RAT, Cryptominer | P1 – P3 |
| Unauthorized Access | Credential Theft, Privilege Escalation, Account Compromise | P1 – P2 |
| Data Breach | Exfiltration, Exposure, Loss, Unauthorized Disclosure | P1 – P2 |
| Denial of Service | DDoS, Resource Exhaustion, Service Disruption | P2 – P3 |
| Policy Violation | Acceptable Use, Data Handling, Shadow IT | P3 – P4 |
| Insider Threat | Malicious Insider, Negligent Insider, Compromised Insider | P1 – P3 |
| Supply Chain | Compromised Vendor, Malicious Update, Third-Party Breach | P1 – P2 |
2 Incident Lifecycle
The incident lifecycle follows a structured nine-phase model aligned with NIST SP 800-61 and ITIL incident management best practices. Each phase has defined entry criteria, activities, and exit criteria.
| From → To | Trigger Criteria |
|---|---|
| Detection → Triage | Alert assigned to analyst |
| Triage → Declaration | Confirmed true positive, meets incident threshold |
| Declaration → Investigation | Incident commander assigned, war room established |
| Investigation → Containment | Threat scope identified, containment plan approved |
| Containment → Eradication | Spread halted, all affected assets identified |
| Eradication → Recovery | All malicious artifacts removed, root cause addressed |
| Recovery → Closure | All systems restored, no recurrence in 24-48h monitoring |
| Closure → PIR | Incident closed, PIR scheduled within 5 business days |
3 Communication Templates
Standardized communication templates ensure consistent, timely, and professional notifications throughout the incident lifecycle. Use these templates as starting points and customize based on the specific incident context.
Initial Incident Notification
Management Brief (CISO/CTO)
Stakeholder Update
External/Regulatory Notification
Incident Closure Notice
4 Escalation Matrix
The escalation matrix defines when, to whom, and how incidents must be escalated. Timely and accurate escalation is critical for minimizing business impact and ensuring appropriate leadership visibility.
| Condition | Escalate To | Method | Timeframe |
|---|---|---|---|
| P1 incident declared | SOC Manager + Incident Commander | Phone + Teams | Immediate |
| P1 > 30 min unresolved | CISO | Phone | Within 30 min |
| P1 > 2 hours unresolved | CTO + Legal | Phone + Email | Within 2 hours |
| Data breach suspected | DPO + Legal | Phone + Email | Within 1 hour |
| Nation-state indicators | CISO + External IR | Phone | Within 15 min |
| P2 incident declared | SOC Manager | Teams + Ticket | Within 15 min |
| P2 > 4 hours unresolved | SOC Manager for re-assessment | Phone | Within 4 hours |
| Regulatory notification required | Legal + Compliance | Email + Phone | Within 1 hour |
| P3 incident declared | Shift Lead | Ticket | Within 1 hour |
| P3 > 24 hours unresolved | SOC Manager | Ticket | Within 24 hours |
| P4 incident declared | Assignment queue | Ticket | Next business day |
When in doubt, escalate. It is always better to escalate unnecessarily than to miss a critical escalation window. Document all escalation attempts and outcomes.
5 RACI Matrix
The RACI matrix defines roles and responsibilities across the incident management lifecycle. Every activity must have exactly one Accountable party, with Responsible, Consulted, and Informed roles clearly assigned.
| Activity | SOC Analyst | SOC Manager | Incident Commander | CISO | IT Operations | Legal | HR | Communications |
|---|---|---|---|---|---|---|---|---|
| Alert Triage | R | I | ||||||
| Incident Declaration | C | R/A | I | I | ||||
| Containment Actions | R | A | C | I | C | |||
| Evidence Preservation | R | A | C | I | ||||
| Investigation Lead | R | C | A | I | C | |||
| Executive Communication | C | R | A | I | C | |||
| External/Legal Notification | I | C | A | R | C | |||
| Recovery Actions | C | A | C | I | R | |||
| Post-Incident Review | R | A | C | I | C | |||
| Lessons Learned Implementation | C | R | A | I | C |
R = Responsible (does the work) | A = Accountable (owns the outcome) | C = Consulted (provides input) | I = Informed (kept in the loop)
6 Post-Incident Review
Post-Incident Reviews (PIRs) are mandatory for all P1 and P2 incidents and recommended for P3 incidents with noteworthy findings. The PIR process is blameless and focused on systemic improvement.
PIR Meeting Agenda
- Review incident timeline (present facts, not blame)
- Identify what went well (effective detections, fast response, good communication)
- Identify what needs improvement (gaps, delays, tool issues)
- Perform root cause analysis using 5-Whys method
- Define action items with owners and deadlines
- Discuss detection improvements and new KQL rules needed
- Review and update playbooks based on findings
- Schedule follow-up meeting to track action items
5-Whys Template
Metrics
Track the following metrics for every incident to identify trends, measure team performance, and drive continuous improvement.
| Metric | Description | Target | How to Calculate |
|---|---|---|---|
| MTTD (Mean Time to Detect) | Time from initial compromise to first detection | < 1 hour for P1 | Alert timestamp − estimated compromise time |
| MTTC (Mean Time to Contain) | Time from detection to successful containment | < 4 hours for P1 | Containment timestamp − detection timestamp |
| MTTR (Mean Time to Resolve) | Time from detection to full resolution | < 24 hours for P1 | Closure timestamp − detection timestamp |
| Blast Radius | Number of affected devices and users | Minimize | Count of unique affected entities |
| Data Impact | Volume/type of data potentially compromised | None | Assessment of accessed/exfiltrated data |
| Escalation Accuracy | Were escalations timely and appropriate | 100% | Audit of escalation decisions vs policy |
7 SLA Tracking
Service Level Agreements define the expected timeframes for each phase of incident handling. SLA compliance is tracked per incident and reported monthly to SOC leadership.
| Priority | Initial Response | Containment | Resolution | Update Cadence |
|---|---|---|---|---|
| P1 - Critical | 15 minutes | 4 hours | 24 hours | Every 30 minutes |
| P2 - High | 1 hour | 8 hours | 48 hours | Every 2 hours |
| P3 - Medium | 4 hours | 24 hours | 5 business days | Every 8 hours |
| P4 - Low | 24 hours | 72 hours | 10 business days | Daily |
The SLA clock starts when the incident is declared. Clock pauses when: (1) Waiting on third-party vendor response (documented), (2) Approved management hold, (3) Awaiting user availability for evidence collection. Clock does NOT pause for: shift changes, weekends/holidays (P1/P2), or internal resource constraints.
8 Ticketing Integration
All incidents must be tracked in the ticketing system with proper field mapping from Microsoft Defender for Endpoint. Consistent ticket creation enables accurate reporting, SLA tracking, and audit compliance.
Required Ticket Fields
| Field | MDE Mapping | Notes |
|---|---|---|
| Incident ID | AlertId / IncidentId |
Auto-generated or mapped |
| Title | Alert Title |
May be customized during triage |
| Priority | Severity (High/Medium/Low/Info) |
Map to P1-P4 |
| Category | Category field |
Use org taxonomy |
| Affected Device(s) | DeviceName |
From AlertEvidence |
| Affected User(s) | AccountName |
From AlertEvidence |
| Description | Alert description + analyst notes | Combine MDE context with analysis |
| Status | Investigation status |
Sync with MDE investigation status |
| Assigned To | Analyst name | Based on shift/skill matrix |
| MDE Incident URL | Portal deep link | For quick reference |