The Virtual STS Pattern for Multi-domain SSO

(Or, what if you have more Authentication Authorities than you really need?)

Implementing a Web-based Single Sign-On solution is a very common requirement for many enterprise security architects.  The good news for us Security Architect geeks is that this is essentially a solved problem.  There are a number of off-the-shelf commercial and open source solutions available, and so it’s mostly a question of choosing one, planning your deployment, and then driving adoption.  The right solution for any given situation will depend upon factors such as your organization’s operational requirements, and your deployment constraints, and so on.   For organizations that are thinking about moving to cloud, I’d suggest that the Cloud Foundry UAA and Login Server are the right place to start.  For organizations that are not yet moving to cloud but require a traditional inside-the-perimeter, Web-based SSO, the Central Authentication Service (CAS) from Jasig (a.k.a. Sakai) can be a good choice.

Now, by definition, SSO is provided by having a single Trusted Third Party, a single Authentication Authority, for all of the applications within a given Trust Domain.  Each of the applications within that trust domain will delegate the user authentication use case to the authentication authority.  Ideally, you should have just one authentication authority for your entire organization, and the classic concept of the “trust domain” corresponds to all of the applications within the perimeter of your enterprise or campus network.

But, what if you have more than one authentication authority on your network? By definition you will have more than one trust domain.  Logging into the authentication provider in one trust domain does not provide an SSO experience to a second application in another trust domain.  Why?  Because the second application trusts a different authentication authority (i.e. perhaps a different CAS server, or a different UAA Login Server).  The session established in the first trust domain is simply not valid in the second trust domain. The users’ SSO experience is lost, even though each of the applications individually did the right thing by delegating to their designated authentication authority.

In an ideal world, this situation should not happen within a single organization.  You would just need to make user all the applications trust your preferred authentication service, and decommission all the others.  But, of course, in the real world we have global heterogeneous organizations, and things like turf battles, legacy installed base, mergers and acquisitions, and so on.  In the end, these and other factors conspire to produce an environment in which you likely have more authentication authorities than you really need, and your end users’ SSO experience is spoiled.

Achieving SSO across different trust domains is what we call Identity Federation.  The usual solution is to use yet another trusted third party, called a Security Token Service.  This is where standards like OASIS SAML and WS-Trust come in.  But using these heavy-weight Identity Federation solutions within a single organization seems unnecessarily complex and expensive.  And, as we well know, cost and complexity are the enemies of good security.  There has to be a better way!

The Virtual STS Pattern

I call this security architecture solution the “Virtual STS Pattern,” because it achieves much of the same federation capability as a solution involving an STS, but without having to actually deploy an STS.  By using this pattern we are able to we achieve (or restore) the end users’ SSO experience within a single organization, even when there are multiple authentication authorities present.  And remember, we do it using only the services you already have, and without resorting to the use of complex WS-* standards, or a dedicated STS.

Essentially, the key insight for the pattern is this: we can designate any one of the authentication authorities as the main login end-point, and then configure any of the other authentication authorities as application services within that main trust domain.  So, let’s call the main authentication authority Login Service “A”, and assume that that login service protects all applications within Trust Domain “A”. From the point of view of Login Service “A” (the main authentication authority), all of the other authentication authorities, say, “B”, “C”, and “D”, and so on, can all be considered applications.  Introducing these additional trust relationships makes each of those secondary authentication authorities a member of trust domain “A”.

Of course, before you access any application in any trust domain, you must authenticate to the designated authentication authority.  If you attempt to access an application, say, application “b”, within trust domain “B”, you will be redirected to authenticate to Login Server “B.”  With the additional configuration described, Login Service “B” will, in turn, redirect the user’s browser to login at its designated authentication authority, which is Login Service “A”, in the main trust domain.  Remember, we’re talking about Web-based SSO here, and so thanks to the use of status code 302 and how the browsers handle the HTTP(S) redirects, this all works seamlessly.   From the point of view of the end user this just looks like an additional redirect in the browser.  Once the user has successfully authenticated at Login Service “A”, they are redirected back to the application from which they came.  In this case, that application happens to be Login Service “B”.  As an application in the main trust domain, Login Service “B”, has been configured so that it happily accepts the authentication token issued from Login Service “A”.  From this point, things proceed normally, and because Login Service “B” now knows the identity of the end user (courtesy of Login Service “A”) it can just issue the user a token for application “b”, without needing to present it’s own login page, or otherwise soliciting any additional raw credentials from the end user.  Voilà!  The end user SSO experience has been restored.  Note that there were no changes required to the individual applications, only to the configuration of the trust relationships between the two authentication servers.

Good security architects are, by their very nature, a skeptical bunch.  That’s what makes them good security architects.  So, I can understand if some readers think this sounds like it comes from the too-good-to-be-true department.  But, rest assured, this pattern does indeed work in practice.  About 2 years ago, I had the opportunity to work with a customer that had (over time, for various reasons), deployed a number of CAS servers, and then needed to re-establish the end users’ SSO experience.  The Virtual STS solution worked just as intended, and subsequently became the topic of a “lightning talk” paper that I presented at the Fall 2011 Jasig “Un-Conference” event, held at the UMASS Online campus in Shrewsbury, MA, in November 2011.  As this same problem has recurred a number of times over the past 2 years, and then just recently came up yet again, I decided to dedicate this blog post to reprising this important topic.  In hindsight, I realize that the original multi-domain use case was not a “one-off” situation.  The Virtual STS idea is both a valuable, and fundamental, security architecture pattern.  It is not dependent upon any particular message formats, or protocol bindings. And although it happens to work particularly well within an HTTPS environment, it could also be made to work for other protocol bindings. The idea is not only clever, but also very practical, since the budget constraints faced by many IT security teams make the prospect of using a real STS a non-starter.  This is a capability that an experienced security architect needs to have handy in their toolkit.

Finally, it’s worth noting that this pattern can also be applied symmetrically, and we can allow the redirect flows to work both ways.  That is, we could also have made Login Server “A” trust Login Server “B”.  In which case, a user surfing in the other direction would get the equivalent SSO experience.  The pattern is flexible enough that one can envision composing a system with a hierarchical tree of trust relationships, or a directed graph of trust relationships, or whatever arbitrary trust relationships might be required.  Just be careful of cycles, and incurring any undesired transitive trust relationships.  Before you decide to establish trust in Login Server “X”, make sure you understand all the other authentication authorities that service is configured to trust!

If you would like more information on how to apply this pattern in the case of two CAS Servers, then check out the PDF version of my un-conference paper.   There is a bit more detail there, as well as some nice figures that help to illustrate the deployment.

Back to the Future: Configuring J2EE Container-based Security

I’ve spent most of the last few years working on security architectures for REST-ful Web Services, using technologies like JAX-RS or Spring MVC.  However, just recently I’ve had a need to go  Back to the Future…I’ve been asked to run some POC experiments with J2EE container-based security.  It’s been both a trip down memory lane, and an opportunity to see some exiting new technologies in action.

For those of you who may not remember the good old days of building EJBs, the basic idea is to architect an application in a classic three-tier deployment model, where the middle or business tier is a collection of Enterprise Java Beans hosted in a J2EE container.  In principal, the business application developer has an easier time of it, since she can delegate all the non-functional overhead of managing transactions, and security, and persistence, etc. to the container….and just focus on coding the business logic.

While there’s still a lot not to like about developing EJBs, they have in fact come along way since I first worked with the technology, way back in 2000.  Remember coding ejbCreate(),  ejbActivate(), ejbPassivate(), and ejbRemove(), all by hand?  Well, EJB 3.0 makes things a whole lot easier.  Really.  Instead of bending code for Mandatory-To-Implement (MTI) interfaces, you can just put the annotation @Stateful or @Stateless on your Java class, and you’re done.

Of course a modern specification like J2EE6 is only really useful when there is an actual container that implements that standard, so it wasn’t long before I needed to choose an actual J2EE server with which to run my tests.  I know a lot of teams who use JBoss successfully, and of course many enterprises use IBM WAS, or Oracle, but I was curious to check out the latest release of Apache Geronimo.  In short, I have been favorably impressed with the 3.0 release of Geronimo.  I found it to be relatively easy to work with, in spite of the limited documentation.  And, after all, it’s open source, so when you are wrestling with a particularly vexing configuration problem, you can just read the source code to see what it is really doing 🙂

Finally, any full application security architecture solution requires that we have a way to manage users, groups, and their corresponding roles, or permissions.  The usual technology of choice for this aspect is Microsoft Active Directory or OpenLDAP.  While searching for an OpenLDAP download for my test environment platform, I happened across IAM Fortress, which is a very cool implementation of ANSI RBAC, built on an OpenLDAP core.  I’ve only just begun to scratch the surface of IAM Fortress, but I’ve been impressed by what I’ve seen so far, and I look forward to blogging more about this little gem in a future post.

So, for anyone who wants to continue this journey down memory lane, and at the same time explore some of the latest in new technology — J2EE, done 2013-style  —  I’ve posted the source code for this POC to my GitHub repository.

Enjoy!   And feel free to issue a pull request 😉