How to build a response playbook for AD compromise

Arun Kumar

6 months ago

Building a Response Playbook for Active Directory Compromise (Step-by-Step)

Incident Response Active Directory Security Containment & Recovery

When Active Directory (AD) is compromised, the attacker isn’t just “in one server” — they often have the keys to your identity kingdom. The difference between a chaotic scramble and a controlled recovery is a clear, tested response playbook.

Important: This guide is defensive and operational. Always follow your org’s legal/HR policies and preserve evidence properly.

Table of contents

What an AD compromise playbook is (and isn’t)
Core principles for AD incident response
Roles & decision rights (RACI-lite)
Preparation phase (before you’re breached)
Triggers & severity levels
First 60 minutes: triage checklist
Containment actions (without breaking the business)
Eradication: remove persistence and regain trust
Recovery: rebuild, restore, validate
Post-incident hardening and lessons learned
Copy/paste templates: runbook structure & comms
FAQ

What an AD compromise playbook is (and isn’t)

A response playbook is a decision and action map for a specific scenario: “We believe AD is compromised.” It defines who decides, what to do, in what order, how to validate, and how to communicate.

It is not a generic incident response policy document. It’s the operational “muscle memory” your team follows under pressure.

Core principles for AD incident response

1) Preserve evidence while you contain

Don’t wipe the scene. Capture logs, volatile context, and a timeline. You can contain aggressively, but do it with a plan that keeps proof intact.

2) Assume identity infrastructure is “tainted”

In a real AD compromise, trust is the problem. Your playbook must define how you regain trust: clean admin workstations, known-good accounts, controlled password resets, and validated restores.

3) Minimize blast radius fast

Stop lateral movement, credential reuse, and replication abuse. Tighten control over privileged groups and permissions. (If you need a refresher on how AD permissions work, see Active Directory permissions explained.)

4) Prefer “known-good admin pathways”

Use dedicated admin workstations, separate accounts, and a break-glass process that’s pre-approved and practiced.

Roles & decision rights (RACI-lite)

Your playbook should name roles, not individuals (people go on leave). At minimum:

Role	Owns	Key Decisions
Incident Commander	Orchestration, priorities, approvals	Severity level, containment scope, business downtime tradeoffs
AD Lead	Directory operations	Account disable/reset strategy, DC isolation, GPO emergency changes
Security Lead / SOC	Detection, investigation, threat intel	IOCs, scope confirmation, monitoring rules
Forensics (internal or vendor)	Evidence and timeline	What to image, what to collect, chain-of-custody
Comms / Legal / HR	Messaging and regulatory steps	Notifications, employee guidance, disclosure requirements
App/Infra Owners	Dependent systems	Service account rotation, outage coordination, validation

Tip: Add a one-page “authority sheet” listing who can approve emergency actions like disabling privileged accounts, isolating domain controllers, or forcing password resets org-wide.

Preparation phase (before you’re breached)

The best response playbook is 70% preparation. Here’s what to pre-stage so you can move fast without improvising.

Preparation checklist

Asset & identity inventory: domain controllers, AD sites, trusts, Tier-0 assets, PKI/AD CS, ADFS, sync tools, admin workstations.
Logging baseline: security logs retention, centralized forwarding, and time sync across DCs and critical servers.
Detection tooling: confirm coverage for identity threat detections (for example, Microsoft Defender for Identity). Start with Microsoft Defender for Identity: overview and validate event collection for Defender for Identity.
Break-glass plan: offline-stored credentials, MFA requirements (where applicable), and step-by-step access procedure.
Privileged access model: separate admin accounts, tiering, and “no email/web browsing” from admin workstations.
Backups: validated System State backups for DCs, plus documented restore steps and test results.
Credential rotation map: what to rotate first (KRBTGT, DA accounts, service accounts, app secrets, certificates), and how.
Communication templates: internal “do/do not” guidance (e.g., don’t reboot suspected systems; report suspicious prompts).
Tabletop exercises: at least quarterly for Tier-0 compromise.

If your team needs foundational clarity on how authentication flows work during investigations, revisit NTLM and Kerberos Authentication Protocols Explained. It makes scoping issues like ticket abuse and credential replay much easier.

Triggers & severity levels

Define what “AD compromise” means in your environment. Otherwise you’ll debate severity while the attacker moves.

Severity	Definition (examples)	Default Response
SEV-1	Evidence of Domain Admin / Enterprise Admin theft, suspicious DC activity, replication abuse indicators, widespread lateral movement	Activate full playbook, isolate Tier-0, exec/legal comms, rotate critical secrets
SEV-2	Compromised admin workstation, suspicious GPO change, credential dumping suspected, multiple failed logons with privilege context	Targeted containment, fast investigation, elevate if scope expands
SEV-3	Single account compromise or localized host compromise without privileged indicators	Standard IR with AD-focused checks

Include operational triggers too (e.g., “MDI high-confidence alert”, “unexpected account lockouts at scale”). For lockout investigations, see Account Lockout Event ID: how to find account lockouts.

First 60 minutes: triage checklist

Goal: confirm scope, stop obvious bleeding, preserve evidence, and prevent privileged escalation.

0–15 minutes: stabilize

Declare incident severity and assign an Incident Commander.
Start an incident log (time-stamped decisions, actions, and owners).
Confirm safe communications channel (out-of-band if you suspect email compromise).
Freeze non-essential changes (pause planned GPO/AD changes, deployments, and admin projects).

15–30 minutes: scope and evidence

Identify suspected entry points (phished admin, compromised endpoint, exposed service, vendor access).
Pull high-value logs centrally (DC Security logs, workstation logs, VPN/proxy, EDR telemetry, identity detections).
Preserve key hosts (likely compromised admin workstation(s), jump boxes, DCs if indicated).
Check for privileged group changes: Domain Admins, Enterprise Admins, Schema Admins, Built-in Administrators.

30–60 minutes: initial containment (minimal disruption)

Disable or isolate the suspected compromised admin workstation(s) from the network (EDR containment preferred).
Disable obviously malicious accounts and revoke sessions where possible.
Block known malicious IPs/domains at perimeter and proxy.
Start a “watchlist” for suspicious logons on DCs and Tier-0 systems.

Containment actions (without breaking the business)

Containment is where playbooks win. You need pre-approved options that scale from “surgical” to “emergency shutdown.”

Surgical containment (preferred first)

Disable or reset the specific suspected accounts (prioritize privileged identities).
Remove suspicious principals from privileged groups (document everything).
Apply temporary conditional access / sign-in restrictions in hybrid environments (where applicable).
Restrict admin logon paths (only from hardened admin workstations / jump hosts).
Increase auditing and forward logs centrally (don’t rely on local retention).

Emergency containment (if SEV-1 expands)

Isolate Tier-0 subnets (DCs, ADFS, PKI, identity tooling) with firewall rules.
Disable non-essential trusts or restrict access if a partner domain is involved.
Implement “deny by default” for privileged logons except from approved admin jump boxes.
Temporarily disable high-risk legacy protocols/services if they are being abused.

Containment guardrails (avoid self-inflicted outages)

Track business-critical service accounts before mass resets (identity outages can be worse than the intrusion if done blindly).
Don’t remove permissions or groups without recording: “what changed, by whom, and why.”
Prefer staged password resets (tiered) rather than “reset everyone immediately,” unless the situation truly requires it.

Eradication: remove persistence and regain trust

Attackers who compromise AD often establish persistence through backdoor accounts, delegated rights, scheduled tasks, GPO abuse, credential material, and ticket-based persistence.

Eradication checklist

Hunt persistence: new users/groups, unexpected admin delegations, suspicious ACL changes, modified GPOs, new scripts in SYSVOL.
Validate DC integrity: compare configurations to known-good baselines; check for unauthorized services, drivers, tasks.
Rotate privileged credentials: Domain Admins, Enterprise Admins, server local admin passwords, break-glass credentials (in a controlled order).
KRBTGT reset plan: document prerequisites and do it carefully (many orgs do a two-step reset as part of a broader recovery plan).
Service account rotation: prioritize Tier-0 service accounts and any accounts with broad delegation.
Certificate/PKI review: if AD CS is present, validate templates, enrollment permissions, and recent certificate issuance.

Reality check: In high-confidence AD compromise, “eradication” often blends into “rebuild.” Your playbook should define the decision point: when do you stop trying to clean and start restoring/rebuilding from known-good?

Recovery: rebuild, restore, validate

Recovery is not “systems are back online.” Recovery is “we re-established trust in identity and can prove it.”

Recovery steps

Choose recovery strategy: clean-in-place vs restore from backup vs rebuild domain (worst case).
Restore carefully: validate backup integrity, isolate restore environment if possible, and keep forensic copies.
Reintroduce DCs safely: ensure patching, secure configuration, and monitored rejoin to production.
Re-enable services in tiers: Tier-0 first, then Tier-1 (servers), then Tier-2 (endpoints/users).
Validate authentication flows: normal logons, service tickets, replication health, time sync, DNS health.
Continuous monitoring: heightened detections for weeks (not days). Attackers often attempt re-entry.

Pro tip: Make “validation” a required sign-off gate. For example: “No privileged group changes in last X hours,” “No suspicious DC logons,” “SYSVOL integrity verified,” and “break-glass tested.”

Post-incident hardening and lessons learned

This is where you reduce the chance of a repeat event and shorten the next response cycle.

Security improvements

Implement or tighten privileged tiering and admin workstation standards.
Reduce standing privilege (JIT/JEA where possible) and review delegation regularly.
Harden GPO change controls and audit SYSVOL changes.
Improve identity detections and centralize key events.

Process improvements

Update the playbook with what actually happened (not what you wish happened).
Add “decision timestamps” to remove ambiguity next time.
Run a tabletop focused on the weakest points found during the incident.
Store final artifacts: timeline, IOCs, actions taken, and hardening backlog.

Copy/paste templates: runbook structure & comms

Playbook structure (recommended headings)

Playbook: AD Compromise
1. Scope and assumptions
2. Severity definitions and triggers
3. Roles and decision rights
4. Evidence collection (what/where/how long)
5. Triage (first 60 minutes)
6. Containment (surgical → emergency)
7. Eradication (persistence removal + secret rotation plan)
8. Recovery (restore/rebuild + validation gates)
9. Communications (internal/external templates)
10. Post-incident actions (hardening + backlog)
11. Appendix: critical contacts, systems, scripts, and checklists

Internal “all-hands” message (short)

Subject: Security incident – temporary access changes

We are responding to a security incident affecting identity systems.
You may notice password prompts or access restrictions while we investigate.

Do not reboot devices if you see unusual login prompts.
Report suspicious emails, MFA prompts you did not initiate, or access issues to: [CHANNEL].

We will provide updates at: [CADENCE / STATUS PAGE].

Admin instruction (break-glass reminder)

Use only the approved admin workstation/jump host.
Do not sign in to DCs from standard endpoints.
Record every privileged action in the incident log (time, action, account, system).
If unsure, stop and escalate to the Incident Commander.

FAQ

How do I know it’s really an “AD compromise” and not a single endpoint issue?

Your triggers should focus on privileged identity impact: admin credential theft indicators, DC logon anomalies, privileged group changes, replication/identity detections, and widespread lateral movement. If you can’t confidently scope it within an hour, treat it as SEV-1 until proven otherwise.

Should we reset everyone’s password immediately?

Not by default. Mass resets can cause business outages and don’t guarantee the attacker is out. Prefer tiered resets: privileged accounts first, then critical service accounts, then broader user base once containment and monitoring are in place.

What logging should be “must have” before an incident?

DC security logs with sufficient retention, centralized forwarding, endpoint telemetry (EDR), and identity detections. If you’re using Defender for Identity, ensure event collection is correctly configured and validated.

How often should we test this playbook?

Run a tabletop at least quarterly for Tier-0 compromise. Also test the operational “hard parts” (break-glass access, evidence collection, restore validation) at least twice a year.