Post-War Analysis: How a Forest Trust Broke Federated Authentication

This is the journey of troubleshooting a complex, niche problem that surprisingly resulted in a very simple solution. I ignored many common sense principles, instead diving deep into StackOverflow rabbit holes in desperate search of the complicated technical answer I thought I needed, not realizing I was exploring a series of red herrings disguising what frankly should have been an obvious answer that I just never considered. I'd like to blame a lack of sleep, or my determination to overcome a difficult challenge, but the whole time I was simply ignoring the basics.

And, as it often goes, the saving grace that smacked some sense back into me came from a 13-year-old article on a WordPress blog I'd never heard of.

Setting the Stage

I have a .NET web application hosted in IIS, let's call it WebApp. WebApp allows users to authenticate with their Domain accounts via Basic Authentication or Windows Authentication. It also supports federation via WS-Fed and SAML2.0, utilizing ASP.NET Impersonation behind the scenes. When using ADFS for federation in an interesting AD environment, I encountered the following issue, which was completely novel to me.

A Wild AD Environment Appears

This environment has the following 2 forests, in a 2-way transitive trust, with the following UPN Suffixes:

WebApp is installed in example.com, and configured with Basic Authentication. If I authenticate as a user with each UPN Suffix across both forests using their UserPrincipalName as the username, here is what happens.

When Jacob authenticates as Jacob@acme.com, it fails for this web app. Although I didn't realize this at the time, I now know this is due to Microsoft’s Collision Detection when routing name suffixes across forests. If I authenticate to example.com using my UPN, and my UPN Suffix exists in multiple forests in the trust, Microsoft will not cross the trust to find my account. It will only search in example.com.

However, if I instead authenticate with the username in the legacy form of NetBiosName\SamAccountName, then every user can successfully authenticate.

User's DomainUserResult
example.comexample\BobSuccess
example.comexample\AliceSuccess
example.comexample\SallySuccess
company.comcompany\JoeSuccess
company.comcompany\JohnSuccess
company.comcompany\JacobSuccess

Federation Uses Confusion. It's a Critical Hit.

After this, I switched to Federated Authentication using ADFS. As expected, when authenticating to ADFS, I experienced the same results. company\Jacob succeeds, jacob@acme.com fails. However, once Jacob authenticates to ADFS and tries to federate to WebApp, WebApp fails to authenticate.

WebApp: Authentication Failed.

So, what's going on here? Again I didn't know about forest trusts sharing UPN Suffixes across multiple forests, the collision detection enforcement that triggers, or how authentication with different username formats would apply. After my initial testing and research, I gathered this conclusion:

Authentication with UPN fails. Authentication with NetBiosName\SamAccountName works. The code uses UPN. Let's change that.

WebApp is Confused... It hit itself in its confusion!

Now let's examine what's happening in the code. Well, some abstract pseudocode, at least.

// Search for user in forest & trusted forests
user = findUserByclaims(claimAttribute, claimValue);

// Throw error if no user was found
if (user == null) {
    throw new Exception("User could not be identified from the provided claims.");
};

// Else, create WindowsIdentity object from UPN and impersonate
else {
    // Error is here
    WindowsIdentity userIdentity = new WindowsIdentity(user.upn);

    userIdentity.Impersonate();
};

So when a user federates to WebApp, I first search for that user's Active Directory account in the current domain and all trusted domains and forests based on the provided claims. If I find an account, I then try to instantiate a WindowsIdentity object from the user's UPN value, and then impersonate that user.

The problem doesn't occur on the .Impersonate() method as I thought it might, but when trying to create the WindowsIdentity from the UPN value. This works perfectly fine for users in example.com and users in company.com who have a routable UPN Suffix, but not for users in company.com who have a non-routable UPN Suffix, i.e. acme.com which exists in both forests.

Okay, easy fix. I'll just use NetBiosName\SamAccountName instead of userPrincipalName. Unfortunately, that throws an error.

An unhandled exception of type 'System.Security.SecurityException' occurred in System.Security.Principal.Windows.dll: 'The name provided is not a properly formed account name.'

Okay, what else can I try? Let's read the docs for the WindowsIdentity class.

Why Documentation is Sometimes the Enemy

Here's where the aquatic assault began as red herrings attacked me from all angles. Per the docs, I am using this constructor, which just requires the user's UPN as an argument. Assuming the service account has the right permissions, creating a WindowsIdentity object with this constructor will automatically retrieve a token for that user via the LsaLogonUser function, utilizing the KERB-S4U_LOGON structure. This is how WebApp can then impersonate the user without knowing their password. Per the docs:

...The UPN identified in sUserPrincipalName is used to retrieve a token for that user through the Windows API LsaLogonUser function. In turn that token is used to identify the user. An exception might be returned due to the inability to log on using the supplied UPN.

...this constructor uses the KERB_S4U_LOGON structure, which was first introduced in Windows Server 2003

It's clear as day, right there in the remarks! LsaLogonUser may fail due to the UPN value I'm supplying. So what I really should be doing is...calling LsaLogonUser directly with my own arguments and using a different WindowsIdentity constructor that accepts the Kerberos Token that returns...right?

I Was Not Right

And here begins a furious, relentless, desperate session of intense Google-fu, prying for any relevant sample code to copy/paste. Sprinkled with very reassuring StackOverflow threads annotating that I'm wading in waters that have never been waded in before, experiencing a unique issue that no one but me seems to understand.

...In conclusion

There is only one use case were[sic] LsaLogonUser is better than LogonUser. But since WindowsIdentity provides a constructor for that (S4U), I don't see why one would ever use LsaLogonUser in a .Net application.

No, ixe013, Security Engineer at Google, there are TWO use cases! I've outsmarted you! Now please point me to sample code that my (at this point) fully-rotted-husk of a brain can steal!

Of course, ixe013 was right. The more I looked, the more the realization struck that even if I had to use LsaLogonUser myself, it's...uhh...very complicated. Daunting. The thought alone gives me stress-induced cold sweats and shivers.

I started looking elsewhere, digging into Kerberos Authentication, the details of Forest Trust Relationships, "how the f*** does IIS Basic Auth work". Finally, I looped back to a simple question.

What even IS a UPN?

The Lightbulb/Eureka/Got'Em Moment...It's Super Effective

Buried deep in the comments section of yet another long-abandoned StackOverflow thread, I encountered a link to this 2010 series of articles from the Jorge's Quest for Knowledge Wordpress site. Jorge's Knowledge reminded me of a basic fact about UPNs, which had been so irrelevant to me for so many years that my hollow husk-brain never once considered it as the obvious answer it was.

Users have 2 UPNs.

  1. Explicit UPN (eUPN). This is the value that shows in the user's userPrincipalName attribute, and is configurable on every user account. This value contains my particularly frustrating UPN Suffix that breaks the rules of a forest trust.
    jacob@acme.com

  2. Implicit UPN (iUPN). ALL users have this, whether you define an eUPN or not, but is not stored in any AD Attribute. The iUPN is always available in the format sAMAccountName@FQDN.
    jacob@company.com

Yeah, it was that simple the whole time. Instead of passing the user's userPrincipalName attribute, I can pass the iUPN instead. new WindowsIdentity("jacob@company.com");. It doesn't matter that Jacob's UPN attribute is jacob@acme.com. I have the domain, and I have a user object with a bunch of AD Attributes, including sAMAccountname. Sure enough, this solves the issue with an incredibly simple code change. No tricky LsaLogonUser implementation, no major code rewrite. Just send a different value for UPN that uniquely identifies the user across all trusted forests.

As a side note, if you're also thinking "well, he isn't authenticating to ADFS with jacob@acme.com either", you're correct. But, this is federation. It could be ADFS or any other SAML or WS-Fed provider, and I'm not necessarily getting the value entered in the username field, or even the UPN as a claim. It might be something else like objectGUID or email. Either way, I'm still searching for his AD account across all trusted forests/domains, and attempting to impersonate him with a UPN value.

You Defeated Trainer "Annoying Problem". You Gained ¥100.

Here are my final takeaways.

  1. Breaking down the problem into simple, testable scenarios is always a great first step.

  2. Occam's Razor is often the reason I can't sleep.

  3. Documentation doesn't lie, but it often omits relevant details for niche situations.

  4. The StackOverflow user you think is wrong probably isn't.

  5. I don't know how to write .NET apps.

  6. Heading sections of your blog post with Pokémon references is always a great first step.