AD high-availability: RODCs and cross-site redundancy

Arun Kumar

3 months ago

AD high availability RODC cross site redundancy

Active Directory high availability

Design for the worst day: local logons at branch speed, safe failover by intent—not accident.

RODC
Sites & Services
Next Closest Site
Password Replication Policy

Definition (snippet-ready): AD high availability with RODCs and cross-site redundancy is the practice of placing read-only domain controllers in low-trust or connectivity-constrained sites and engineering client failover—via site costs, replication schedules, and the try next closest site behavior—so authentication and directory access continue even when preferred paths fail.Branch offices haven’t vanished; they’ve changed. You still need fast, local sign-ins, but you cannot assume perfect WAN links or safe server rooms. Read-Only Domain Controllers (RODCs) exist for that tension: they keep logons local while limiting the secrets stored onsite. When you pair RODCs with a clear site and site-link design—including replication costs and schedules—and enable deterministic client failover, you get continuity instead of chaos.

Key idea: Cross-site redundancy is engineered, not emergent. You shape client behavior with sites, costs, and policies, not ping times.

The surface view is incomplete

“Put an RODC in the branch” is only half the story. Three subtleties decide whether users sail or suffer:

Credential scope on RODCs—governed by Password Replication Policy (PRP)—decides who keeps working offline when the WAN drops.
Client failover follows site link cost and the try next closest site policy, not a random “nearest ping.”
Global Catalog needs: some apps require a writable GC and won’t accept an RODC-GC. Plan placements accordingly. See Global Catalog fundamentals.

First principles you can build on

To master design, reduce to irreducible truths. These six rules drive the expected—and surprising—behaviors you’ll see in production.

RODCs are read-only. They host read-only directory partitions; writes are proxied to a writable DC (RWDC). Inbound replication only. Expect read speed, not write locality.
Credentials are opt-in. RODCs cache only the accounts you allow via PRP (Allowed/Denied lists). If a credential isn’t cached, authentication is forwarded to an RWDC.
Each RODC has its own krbtgt. A forged TGT’s blast radius is limited to that RODC. Never include that krbtgt in any allowed-to-cache group.
Filter sensitive attributes. Use the RODC filtered attribute set for credential-like app secrets so they never replicate to branches.
Client gravity is cost-based. DC Locator chooses by site and site-link cost; enable try next closest site for predictable failover.
Let KCC/ISTG automate. The Knowledge Consistency Checker and Intersite Topology Generator build the graph. Steer with costs; override sparingly.

Concrete scenarios

Examples make the rules tangible.

Retail branch with weak security and flaky WAN: Place an RODC + DNS at the store. Allow only Store Users and POS Devices to cache. Executives remain denied, even offline. See RODC guide.
Regional office between two hubs: Cost the primary hub lower than the secondary; enable try next closest site. Clients fail over sensibly instead of crossing the globe.
App requiring writable GC: Keep a reachable writable GC for directory-aware apps and workflows. Review GC design early.

Implementation hero section: build a resilient RODC + cross-site design

This is the straight-through, production-grade path. Adjust naming, costs, and schedules to your WAN realities. Cross-reference fundamentals at: RODC planning, site topology, and DC Locator.

1) Model sites, subnets, and site-link costs

Create sites & map subnets to the correct site so DC Locator isn’t guessing. See Sites & Services overview.
```
New-ADReplicationSite -Name "BRANCH01"
New-ADReplicationSubnet -Name "10.20.10.0/24" -Site "BRANCH01"
```

Define site links with explicit costs and intervals. Lower cost = preferred path. Keep replication frequency sane (e.g., 180 minutes for low-change branches).

New-ADReplicationSiteLink -Name "BRANCH01-HUB1" -SitesIncluded BRANCH01,HUB1 -Cost 50 -ReplicationFrequencyInMinutes 180
New-ADReplicationSiteLink -Name "BRANCH01-HUB2" -SitesIncluded BRANCH01,HUB2 -Cost 80 -ReplicationFrequencyInMinutes 180

Prefer simple graphs. Use default transitive bridging unless WAN policy demands otherwise. Complex meshes create surprising failover paths.

2) Enable deterministic client failover

Turn on “Try Next Closest Site.” GPO path:
Computer Configuration → Policies → Administrative Templates → System → Net Logon → DC Locator DNS Records → Try Next Closest Site = Enabled.

Validate mapping with:

nltest /dsgetsite
nltest /dclist:corp.example.com
nltest /dsgetdc:corp.example.com /force /try_next_closest_site

3) Stage and promote the RODC with role separation

Precreate the RODC account (optional) to delegate build to local techs without domain-wide rights.

Assign local branch admin via Administrator Role Separation:

ntdsutil
roles
connections
connect to server <RWDC-FQDN>
quit
local roles
add <corp\branchtech> as administrator <RODC-FQDN>
quit
quit

Promote as RODC (+ DNS if hosting):

Install-ADDSDomainController `
  -DomainName "corp.example.com" `
  -SiteName "BRANCH01" `
  -ReadOnlyReplica:$true `
  -InstallDns:$true `
  -Credential (Get-Credential)

4) Engineer the Password Replication Policy (PRP)

PRP is your blast-radius dial. Only cache identities the site needs when isolated.

Deny privileged groups permanently. Keep Denied RODC Password Replication Group intact (Domain Admins, Enterprise Admins, etc.).
Create per-site Allowed groups (e.g., RODC-BRANCH01-Allowed), add branch users, machines, and supported service accounts.

Apply PRP and verify:

Add-ADDomainControllerPasswordReplicationPolicy -Identity BR01-RODC -AllowedList "RODC-BRANCH01-Allowed"
Get-ADDomainControllerPasswordReplicationPolicy -Identity BR01-RODC -AppliedList

Prepopulate cache before cutover to avoid day-1 WAN pain:

repadmin /rodcpwdrepl <HubRWDC> "CN=Alice Smith,OU=Branch01,DC=corp,DC=example,DC=com" "CN=BR01-RODC,OU=Domain Controllers,DC=corp,DC=example,DC=com"

Audit usage to catch surprises:

Get-ADDomainControllerPasswordReplicationPolicyUsage -Identity BR01-RODC -AuthType Negotiate

Background reading: RODC planning & PRP, RODC deployment.

5) DNS on the RODC (keep reads local, writes upstream)

Install DNS on the RODC; point branch clients to it as primary. Use a hub RWDC/DNS as secondary.
Understand behavior: AD-integrated zones replicate to the RODC DNS as read-only; dynamic updates flow to a writable peer. Ensure the RODC’s preferred DNS includes a writable DC.
Verify registrations by checking SRV records and event logs on the writable DNS server.

6) Global Catalog placement & caveats

You can mark an RODC as GC for faster forest-wide reads, but some apps require a writable GC. Keep at least one reachable writable GC for those apps.
Use GC fundamentals to decide placements early.

7) Observability and troubleshooting runbook

Trace DC Locator: enable Netlogon debug logging temporarily.

Validate site mapping:

nltest /dsgetsite
nltest /dsgetdc:corp.example.com /force

Replication status:

repadmin /replsummary
repadmin /showrepl BR01-RODC

Health overview: monitor Directory Services, DNS Server, and Kerberos logs; alert on repeated RODC forwards to RWDC during WAN incidents.

Implications and inherent tendencies

Security posture improves by default when privileged credentials are denied from caching and sensitive attributes are filtered. Mis-permissioning can silently widen cache scope—audit PRP usage.
Latency is policy-driven. Replication intervals and schedules trade WAN spend for staleness. Be explicit and document the tradeoffs in change records.
Failover follows costs. Poor site-link math yields “mystery DCs.” Keep the graph simple enough to reason about in a crisis.
RODCs reduce, not remove, risk. They still host data. Treat them as higher-risk assets compared to hub RWDCs and monitor accordingly.
Apps may surprise you. Some need writable GCs or follow specific referral patterns. Test before you place them behind an RODC.

Expert mental models

Blast-radius budgeting: Every RODC is a risk envelope. PRP defines how much credential exposure you accept if the box is stolen.
Cost-driven gravity: Clients “fall” toward DCs along the lowest-cost paths. You’re shaping gravity with costs and policies.
Write scarcity at the edge: Edge DCs answer reads and KDC tickets from cache. True writes take a designed path to RWDCs—or are deferred.
Automate, then pin: Let KCC/ISTG build 90% of the topology. Pin only the connections you absolutely must control.

Misunderstandings, risks, and correctives

Executive passwords cached at branches. Keep privileged identities in Denied RODC Password Replication Group. Review quarterly.
No prepopulation before go-live. Day-1 logons crawl over the WAN. Prepopulate cache for the branch roster the night before cutover.
Random failover paths. Without try next closest site and sane costs, clients pick far DCs. Enable the policy; simplify costs.
Treating an RODC like a full DC. Apps needing writes fail oddly. Keep a reachable writable GC and test workloads.
Overriding KCC everywhere. Hand-built meshes get brittle. Prefer defaults; override only with a documented reason.

Expert essentials checklist

Map every subnet to the right site.
Keep site-link costs/schedules simple and intentional.
Enable Try Next Closest Site via GPO.
Deploy RODC with Administrator Role Separation.
Lock down PRP: deny privileged, allow only what’s needed.
Prepopulate caches; then audit who actually cached.
Place at least one reachable writable GC for app compatibility.
Host DNS on the RODC; point to a writable peer as secondary.
Monitor Netlogon/KDC and replication with nltest/repadmin.

Applications, consequences, and the road ahead

Branch UX: With the right PRP and DNS, staff logons feel local—even during WAN wobble. Group Policy and LDAP SRV lookups resolve locally; only non-cached auths traverse the WAN.

Incident containment: If a branch is compromised, the attacker cannot mint forest-wide TGTs from that RODC’s krbtgt. You can rebuild the single RODC without immediate domain-wide krbtgt rotation.

Hybrid identity alignment: Designs that lean on DNS and cost-based failover align well with modern client behavior. Keep Netlogon/DC-Locator policies documented as part of your DR runbooks.

Future direction: Expect more policy knobs for deterministic DC location and smaller credential footprints on the edge. The long-term pattern is clear: minimize write surfaces, keep blast radii tiny, and make failover math obvious.

Key takeaways & wrap-up

RODCs deliver availability with guardrails. Local authentication without spraying secrets everywhere.
Redundancy is designed. Sites, costs, schedules, and policies—not accidents—determine who fails over where.
Design for the worst day. Cache what must work offline; keep a writable GC reachable; and let automation handle the rest.

Make it real: Pilot one branch. Use the checklist, enable Try Next Closest Site, prepopulate ten users, and run a planned failover test next week.

Want the ready-to-use runbook, PRP worksheet, and DC-Locator validation script? Subscribe and we’ll send the Branch RODC Design Checklist (PDF, PowerShell snippets) to your inbox.

AD high-availability: RODCs and cross-site redundancy

The surface view is incomplete

First principles you can build on

Concrete scenarios

Implementation hero section: build a resilient RODC + cross-site design

1) Model sites, subnets, and site-link costs

2) Enable deterministic client failover

3) Stage and promote the RODC with role separation

4) Engineer the Password Replication Policy (PRP)

5) DNS on the RODC (keep reads local, writes upstream)

6) Global Catalog placement & caveats

7) Observability and troubleshooting runbook

Implications and inherent tendencies

Expert mental models

Misunderstandings, risks, and correctives

Applications, consequences, and the road ahead

Key takeaways & wrap-up

Further reading

External sources: