How to detect circular group nesting and resolving token bloat

Arun Kumar

3 months ago

Detecting circular group nesting and resolving token bloat

Group nesting is one of Active Directory’s most powerful features: it lets you express roles, aggregate access, and scale delegation without touching every user object. It’s also one of the easiest ways to accidentally create circular membership (loops) and quietly inflate a user’s logon token until things start failing in weird, expensive-to-debug ways.

This guide gives you an operational, “audit-and-fix” approach: how to detect circular nesting, how to quantify token bloat, what symptoms to look for, and how to redesign group structures so the problem stays solved.

The mental model: you are building a graph, not a list

Treat your directory’s nested groups as a directed graph:

Nodes = groups
Edges = “Group A contains Group B” (A → B)

A circular nesting issue is simply a cycle in this graph: A → B → C → A. Cycles are dangerous because:

Recursive membership evaluation can become slow, inconsistent, or fail depending on the tool/path used.
They encourage “spaghetti nesting” that accumulates memberships into huge transitive sets.
They hide privilege escalation paths (“I didn’t know that group ultimately included Domain Admins”).

Token bloat is the other side of the same graph: every security group SID that becomes “effective” for a user contributes to their authorization data. If you let nesting become unbounded, token size becomes unbounded too.

What token bloat actually is (and why it breaks things)

When a user logs on and authenticates, Windows builds an authorization context that includes:

the user SID
group SIDs (direct and transitive)
SIDHistory (if present)
other authorization data (depending on scenario)

In Kerberos, this data is embedded in tickets (the PAC). In NTLM or local token construction paths, a similar concept applies: the resulting token is bigger and must be carried, processed, and sometimes transmitted.

Symptoms of token bloat vary by workload, but commonly include:

Intermittent access failures to web apps (classic example: IIS returning 401/403 in ways that don’t match permissions).
Logon delays (especially first logon after membership changes) due to group expansion and ticket building.
Kerberos falling back to alternative behaviors (larger responses, different ticket paths), or apps failing to accept the auth payload.
“Works on some machines” patterns when client/server settings differ.

The key point: token bloat is almost never a single “bad group.” It’s usually the emergent result of many “reasonable” nestings that compound over time.

Why circular nesting and token bloat tend to show up together

Circular nesting doesn’t always directly increase token size by itself (a cycle doesn’t magically create new unique SIDs), but it is a strong indicator that group design has lost its boundaries. Once boundaries are gone, you typically see:

excessive transitive expansion (roles include roles include roles)
resource groups containing other resource groups
people added to “shortcuts” instead of the correct role group
uncontrolled use of universal groups across domains
SIDHistory not cleaned up after migrations

In other words: cycles are the canary; token bloat is the collapse.

Detection strategy overview

You want two complementary audits:

Cycle detection: find any membership loops and output the exact cycle path.
Token risk detection: identify users (and groups) with unusually large effective security group sets.

Then you remediate in this order:

Break cycles (they poison every other analysis and create ambiguity).
Reduce unnecessary effective memberships (fix the “shape” of the graph).
Only if needed, apply tactical configuration changes (temporary safety valves, not the real fix).

Detecting circular group nesting with PowerShell (graph-based)

Most “quick” scripts fail here because they either:

use naive recursion and hit cycles or depth issues, or
only look one level deep and miss transitive loops

What you want is a standard graph cycle detection pattern (DFS with a recursion stack). The script below builds a group-to-group adjacency list, then walks it to find cycles and prints the cycle path.

Script: find circular nesting (clear and readable)

# Requires RSAT ActiveDirectory module
Import-Module ActiveDirectory

# Helper: safely resolve a DN to an AD object class quickly
function Get-ObjectClassFast {
  param([Parameter(Mandatory)] [string]$DistinguishedName)

  try {
    (Get-ADObject -Identity $DistinguishedName -Properties objectClass -ErrorAction Stop).objectClass
  } catch {
    $null
  }
}

# 1) Collect groups and build adjacency list
Write-Host "Building group nesting graph..."
$groups = Get-ADGroup -Filter * -Properties member, distinguishedName

# adjacency: groupDN -> array of nested groupDNs
$adj = @{}
foreach ($g in $groups) {
  $gdn = $g.DistinguishedName
  $adj[$gdn] = @()

  if ($g.member) {
    foreach ($m in $g.member) {
      $cls = Get-ObjectClassFast -DistinguishedName $m
      if ($cls -and $cls -contains "group") {
        $adj[$gdn] += $m
      }
    }
  }
}

# 2) DFS cycle detection
$visited = New-Object "System.Collections.Generic.HashSet[string]"
$inStack = New-Object "System.Collections.Generic.HashSet[string]"
$parent  = @{}  # childDN -> parentDN (for reconstructing cycle)
$cycles  = New-Object "System.Collections.Generic.List[object]"

function Record-Cycle {
  param([string]$start, [string]$end)

  # Reconstruct cycle path: end -> ... -> start, then close loop
  $path = New-Object "System.Collections.Generic.List[string]"
  $path.Add($end) | Out-Null
  $cur = $start

  while ($cur -and $cur -ne $end) {
    $path.Add($cur) | Out-Null
    $cur = $parent[$cur]
  }
  $path.Add($end) | Out-Null
  [array]::Reverse($path.ToArray()) | Out-Null

  $cycles.Add(($path -join "  ->  ")) | Out-Null
}

function Dfs-Visit {
  param([Parameter(Mandatory)] [string]$node)

  $visited.Add($node) | Out-Null
  $inStack.Add($node) | Out-Null

  foreach ($nbr in $adj[$node]) {
    if (-not $visited.Contains($nbr)) {
      $parent[$nbr] = $node
      Dfs-Visit -node $nbr
    }
    elseif ($inStack.Contains($nbr)) {
      # Found a back-edge => cycle
      $parent[$nbr] = $node
      Record-Cycle -start $node -end $nbr
    }
  }

  $inStack.Remove($node) | Out-Null
}

Write-Host "Scanning for cycles..."
foreach ($node in $adj.Keys) {
  if (-not $visited.Contains($node)) {
    Dfs-Visit -node $node
  }
}

if ($cycles.Count -eq 0) {
  Write-Host "No circular nesting detected."
} else {
  Write-Host "Circular nesting detected:`n"
  $cycles | Sort-Object -Unique | ForEach-Object { $_ }
}

How to interpret output: each printed line is a cycle path. Your job is not to “delete groups” but to decide which edge is invalid. Usually, the wrong link is a “resource group includes another resource group” shortcut or a “catch-all” group stuffed into roles.

Operational tip: export the results

In real environments, you’ll want evidence. Wrap the cycle output into a CSV (cycle string + involved DNs), attach it to a ticket, and make changes in a controlled window.

Detecting token bloat risk: identify heavy tokens and their drivers

You can’t manage what you can’t measure. There are two practical measurements:

Effective security groups count (a strong proxy for token size)
Membership shape (depth, fanout, and use of universal groups / SIDHistory)

Quick proxy: count effective token groups for users

The tokenGroups constructed attribute gives the set of SIDs that would land in a user’s token (security-enabled groups, expanded transitively). It’s a practical way to flag accounts that are likely to hit size limits.

Import-Module ActiveDirectory

# Adjust scope as needed (OU filter, user filter, etc.)
$users = Get-ADUser -Filter * -Properties tokenGroups, samAccountName

$report = foreach ($u in $users) {
  $count = 0
  if ($u.tokenGroups) { $count = $u.tokenGroups.Count }

  [pscustomobject]@{
    SamAccountName = $u.samAccountName
    TokenGroupSidCount = $count
  }
}

# Flag "large" accounts (threshold depends on your environment)
$report |
  Sort-Object TokenGroupSidCount -Descending |
  Select-Object -First 50 |
  Format-Table -AutoSize

This does not directly output “bytes,” but it quickly surfaces the accounts most likely to fail under Kerberos / app constraints.

Drill-down: list the groups that contribute

Once you have a suspect user, you need the “why.” Start with direct membership, then map transitive membership and find the biggest aggregators.

Import-Module ActiveDirectory

$user = Get-ADUser -Identity "jdoe" -Properties memberOf, distinguishedName
"Direct groups:"
$user.memberOf

"All groups (recursive, via AD cmdlet):"
Get-ADPrincipalGroupMembership -Identity $user |
  Where-Object { $_.GroupCategory -eq "Security" } |
  Sort-Object Name |
  Select-Object Name, GroupScope, DistinguishedName

Your remediation usually targets the top aggregators (groups that include many other groups) and misplaced membership (users directly added to resource groups or “everyone” groups).

Common root causes (the patterns that create bloat)

1) Resource groups nested into role groups (inversion)

A clean model separates “who you are” (roles) from “what you can touch” (resources). When resource groups get nested into role groups (or vice versa), your graph becomes circular-prone and hard to reason about.

2) Universal groups used as a convenience layer

Universal groups are valuable in multi-domain designs, but they are also often used as “global catch-alls.” If universal groups become your default aggregation mechanism, you’ll inflate transitive membership across contexts and make troubleshooting miserable.

3) Legacy migrations leaving SIDHistory everywhere

SIDHistory is useful for transitional access, but it’s a long-term tax: it adds authorization data and can mask that access is “still working” for the wrong reason. If you don’t have a SIDHistory retirement plan, your tokens will only grow.

4) “Shortcut” adds that bypass the intended layer

Someone needs access to a share “right now,” so they’re added directly to a deep resource group. That feels harmless. Over a year, it becomes a systemic bypass of your access model and a direct contributor to token growth.

Remediation: break cycles safely

When you find a cycle, do not randomly remove links. Use a controlled approach:

Identify the business meaning of each group in the cycle (role? resource? administrative? exception?).
Pick the edge that violates the intended model (often a “resource contains role” or a shortcut).
Stage the change (document current membership, expected access, rollback plan).
Test access paths for a few representative users before and after.

If you have a ticketing process, attach:

cycle path output
the single membership link you plan to remove
the replacement design (which group should contain which)

Remediation: reduce token bloat by fixing the group design

Adopt a stable layering pattern (and enforce it)

The most common approach for Windows resource authorization is a layered model such as:

Accounts (users/computers)
Role groups (job function, department roles)
Resource groups (per share/app/resource permission sets)
Permissions (ACLs on folders/apps assigned to resource groups)

The important part is not the acronym; it’s the invariant: roles should not depend on resources, and resource groups should be the only layer that touches ACLs.

Stop direct user adds to resource groups

Make it a policy: users go into role groups; role groups go into resource groups. If you need exceptions, create explicit exception role groups (and name them as exceptions).

Reduce nesting depth (not just total groups)

Two environments can have the same number of groups but very different operational behavior. Deep nesting creates long transitive chains, slow evaluation, and makes cycles more likely. Prefer a shallower design with clearer “aggregation boundaries.”

Rationalize “big bucket” groups

Large “everyone-who-might-need-it” groups often become the hub that inflates every token. Break these into:

clear roles (who)
clear resources (what)
clear exceptions (why)

Tactical safety valves (use carefully)

Sometimes you need breathing room while you fix the model. There are settings that affect token size handling, but treat them as temporary mitigations, not the end state.

MaxTokenSize (last resort, not a design strategy)

You may see guidance to increase MaxTokenSize via registry/GPO on clients and servers. This can reduce immediate failures for some workloads, but it can also:

mask the underlying design problem
push failures to different apps or boundary points
increase the size and processing cost of auth data across the environment

If you change it, do it with change control, document the scope, and still prioritize reducing group explosion.

Application-specific constraints

Web apps, proxies, and middleware often have their own header/token limits. Token bloat is frequently discovered “as an IIS problem,” but the root cause is directory membership design.

Prevention: make circular nesting hard to create

You don’t want to rely on hero debugging. Build guardrails:

Naming conventions that encode intent (ROLE_, RES_, APP_, ACL_, EXC_).
Delegation boundaries: only a small group can nest groups; broader teams can add users to role groups.
Automated audits: scheduled cycle detection + top tokenGroupSidCount report.
Change review for any membership change involving “group inside group” operations.

Audit cadence that actually works

Weekly: top 50 users by tokenGroups count
Weekly: new group-in-group changes (especially privileged groups)
Monthly: full cycle scan + export
Quarterly: access model review (are people bypassing roles?)

Practical troubleshooting workflow

Confirm the symptom: which app fails, what user(s), what’s the error pattern?
Check effective group volume: does the user have an unusually high tokenGroups count?
Find aggregators: what “hub” groups pull in hundreds of memberships?
Scan for cycles: cycles indicate a design breach and can confuse “why is this included?” analysis.
Fix the model: remove the wrong nesting links and re-express access via roles → resources.
Re-test: validate both access and that membership volume reduced meaningfully.

FAQ

Does a circular nesting always cause outages?

Not always immediately. Some tooling paths will tolerate it, others will behave unpredictably. The bigger issue is what cycles reveal: a loss of structure that almost always correlates with access sprawl and token growth.

If I remove a link to break a cycle, how do I avoid breaking access?

Replace the invalid relationship with a valid one. Usually this means: move users into role groups, keep resource groups tied to ACLs, and nest roles into resources (not the other way around). Validate with a small test set of users who represent the access pattern.

Is “fewer groups” the only answer?

No. The goal is fewer effective security SIDs per user and a cleaner graph shape. Many environments can have lots of groups and still be stable if nesting boundaries are enforced and depth is controlled.