The Virtual STS Pattern for Multi-domain SSO

(Or, what if you have more Authentication Authorities than you really need?)

Implementing a Web-based Single Sign-On solution is a very common requirement for many enterprise security architects.  The good news for us Security Architect geeks is that this is essentially a solved problem.  There are a number of off-the-shelf commercial and open source solutions available, and so it’s mostly a question of choosing one, planning your deployment, and then driving adoption.  The right solution for any given situation will depend upon factors such as your organization’s operational requirements, and your deployment constraints, and so on.   For organizations that are thinking about moving to cloud, I’d suggest that the Cloud Foundry UAA and Login Server are the right place to start.  For organizations that are not yet moving to cloud but require a traditional inside-the-perimeter, Web-based SSO, the Central Authentication Service (CAS) from Jasig (a.k.a. Sakai) can be a good choice.

Now, by definition, SSO is provided by having a single Trusted Third Party, a single Authentication Authority, for all of the applications within a given Trust Domain.  Each of the applications within that trust domain will delegate the user authentication use case to the authentication authority.  Ideally, you should have just one authentication authority for your entire organization, and the classic concept of the “trust domain” corresponds to all of the applications within the perimeter of your enterprise or campus network.

But, what if you have more than one authentication authority on your network? By definition you will have more than one trust domain.  Logging into the authentication provider in one trust domain does not provide an SSO experience to a second application in another trust domain.  Why?  Because the second application trusts a different authentication authority (i.e. perhaps a different CAS server, or a different UAA Login Server).  The session established in the first trust domain is simply not valid in the second trust domain. The users’ SSO experience is lost, even though each of the applications individually did the right thing by delegating to their designated authentication authority.

In an ideal world, this situation should not happen within a single organization.  You would just need to make user all the applications trust your preferred authentication service, and decommission all the others.  But, of course, in the real world we have global heterogeneous organizations, and things like turf battles, legacy installed base, mergers and acquisitions, and so on.  In the end, these and other factors conspire to produce an environment in which you likely have more authentication authorities than you really need, and your end users’ SSO experience is spoiled.

Achieving SSO across different trust domains is what we call Identity Federation.  The usual solution is to use yet another trusted third party, called a Security Token Service.  This is where standards like OASIS SAML and WS-Trust come in.  But using these heavy-weight Identity Federation solutions within a single organization seems unnecessarily complex and expensive.  And, as we well know, cost and complexity are the enemies of good security.  There has to be a better way!

The Virtual STS Pattern

I call this security architecture solution the “Virtual STS Pattern,” because it achieves much of the same federation capability as a solution involving an STS, but without having to actually deploy an STS.  By using this pattern we are able to we achieve (or restore) the end users’ SSO experience within a single organization, even when there are multiple authentication authorities present.  And remember, we do it using only the services you already have, and without resorting to the use of complex WS-* standards, or a dedicated STS.

Essentially, the key insight for the pattern is this: we can designate any one of the authentication authorities as the main login end-point, and then configure any of the other authentication authorities as application services within that main trust domain.  So, let’s call the main authentication authority Login Service “A”, and assume that that login service protects all applications within Trust Domain “A”. From the point of view of Login Service “A” (the main authentication authority), all of the other authentication authorities, say, “B”, “C”, and “D”, and so on, can all be considered applications.  Introducing these additional trust relationships makes each of those secondary authentication authorities a member of trust domain “A”.

Of course, before you access any application in any trust domain, you must authenticate to the designated authentication authority.  If you attempt to access an application, say, application “b”, within trust domain “B”, you will be redirected to authenticate to Login Server “B.”  With the additional configuration described, Login Service “B” will, in turn, redirect the user’s browser to login at its designated authentication authority, which is Login Service “A”, in the main trust domain.  Remember, we’re talking about Web-based SSO here, and so thanks to the use of status code 302 and how the browsers handle the HTTP(S) redirects, this all works seamlessly.   From the point of view of the end user this just looks like an additional redirect in the browser.  Once the user has successfully authenticated at Login Service “A”, they are redirected back to the application from which they came.  In this case, that application happens to be Login Service “B”.  As an application in the main trust domain, Login Service “B”, has been configured so that it happily accepts the authentication token issued from Login Service “A”.  From this point, things proceed normally, and because Login Service “B” now knows the identity of the end user (courtesy of Login Service “A”) it can just issue the user a token for application “b”, without needing to present it’s own login page, or otherwise soliciting any additional raw credentials from the end user.  Voilà!  The end user SSO experience has been restored.  Note that there were no changes required to the individual applications, only to the configuration of the trust relationships between the two authentication servers.

Good security architects are, by their very nature, a skeptical bunch.  That’s what makes them good security architects.  So, I can understand if some readers think this sounds like it comes from the too-good-to-be-true department.  But, rest assured, this pattern does indeed work in practice.  About 2 years ago, I had the opportunity to work with a customer that had (over time, for various reasons), deployed a number of CAS servers, and then needed to re-establish the end users’ SSO experience.  The Virtual STS solution worked just as intended, and subsequently became the topic of a “lightning talk” paper that I presented at the Fall 2011 Jasig “Un-Conference” event, held at the UMASS Online campus in Shrewsbury, MA, in November 2011.  As this same problem has recurred a number of times over the past 2 years, and then just recently came up yet again, I decided to dedicate this blog post to reprising this important topic.  In hindsight, I realize that the original multi-domain use case was not a “one-off” situation.  The Virtual STS idea is both a valuable, and fundamental, security architecture pattern.  It is not dependent upon any particular message formats, or protocol bindings. And although it happens to work particularly well within an HTTPS environment, it could also be made to work for other protocol bindings. The idea is not only clever, but also very practical, since the budget constraints faced by many IT security teams make the prospect of using a real STS a non-starter.  This is a capability that an experienced security architect needs to have handy in their toolkit.

Finally, it’s worth noting that this pattern can also be applied symmetrically, and we can allow the redirect flows to work both ways.  That is, we could also have made Login Server “A” trust Login Server “B”.  In which case, a user surfing in the other direction would get the equivalent SSO experience.  The pattern is flexible enough that one can envision composing a system with a hierarchical tree of trust relationships, or a directed graph of trust relationships, or whatever arbitrary trust relationships might be required.  Just be careful of cycles, and incurring any undesired transitive trust relationships.  Before you decide to establish trust in Login Server “X”, make sure you understand all the other authentication authorities that service is configured to trust!

If you would like more information on how to apply this pattern in the case of two CAS Servers, then check out the PDF version of my un-conference paper.   There is a bit more detail there, as well as some nice figures that help to illustrate the deployment.

Cloud Foundry SAML Login via the Shibboleth IdP

Over the last decade or so, many large and mid-sized enterprise organizations have invested in identity federation and cross-domain single sign-on solutions that are based on the SAML2 standard from OASIS.   With the trend towards cloud computing, and the emergence of PAAS offerings like Cloud Foundry, it only makes sense that we’d want to be able to leverage that investment as a way to authenticate to the cloud.   In this post, I describe the configuration steps needed to enable a user to log in into a Cloud Foundry  environment, using a SAML assertion issued from a Shibboleth Identity Provider (IdP).

As background, the Cloud Foundry side of the equation basically consists of three services:  the (original) login-server, the new saml-login-server, and the uaa.  In this post we focus only on the configuration needed for the saml-login-server.  Each of these can be deployed as a WAR file in your favorite servlet container —  Cloud Foundry uses Tomcat.  On the Identity Provider side of the equation, we have the Shibboleth IdP service, which is also deployed as a WAR file.

There are basically five key items that need to be configured.

  1. Configure the saml-login-server via editing the login.yml as needed.
  2. Update the idp.xml file in the saml-login-server, and do a build/deploy.
  3. Start the saml-login-server, and use your browser to generate the required Service Provider (SP) metadata.
  4. Move the generated SP metadata over to the IdP.
  5. If needed, update the IdP configuration to release a suitable NameID attribute (e.g. corresponding to what was configured in login.yml, above).

We’ll go through each of these steps in a bit more detail below.

1. Configuring login.yml

First, we’ll need to edit the login.yml file to customize it to your local environment.   Of course, this will include updates to the hostname and port numbers from their default values, as appropriate (I won’t detail all those routine steps here.) In addition, you should be sure to set a value for the login:entityBaseUrl.  This will be the servlet context, e.g. the place where the Cloud Foundry saml-login-server is running.  You’ll also need a value for the login:entityID key. This will be the unique name of this saml-login-server instance, within your SAML-based SSO environment.  I chose the unimaginative designations shown below:

login:
...<snip>...
  # The entity base url is the location of this application
  # (The host and port of the application that will accept assertions) 
  entityBaseURL: http://<hostname>:8080/saml-login-server
  # The entityID of this SP
  entityID: cloudfoundry-saml-login-server

Finally, you may want to set the value of the key login:saml:nameID.  The default setting is “unspecified” as shown below:

nameID: urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified

At run time, this setting means that the Cloud Foundry saml-login-server will accept whatever <NameID> attribute the IdP chooses to send in the SAML assertion.  Depending upon how your IdP is configured, this default configuration may be just fine.  My IdP was initially configured to release only a transientID, and while this basically worked, it meant that my session was not linked to my existing Cloud Foundry user id, which was previously registered as my email (e.g. for doing the straight HTML form-based login via my username/password, or for doing a vmc login or a cf login prior to doing a cf push command line operation).   For consistency in my configuration I changed this value to be emailAddress as shown below:

nameID: urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress

Now, the Cloud Foundry saml-login-server will specifically request that the <NameID> element contained in the SAML assertion returned from the IdP contain the user’s email address.  This can then be correlated to the user’s existing account in the Cloud Foundry UAA database, when later doing OAuth token grants for applications.

2. Copy your idp.xml to the Cloud Foundry SAML login server

Metadata files are essentially the “secret sauce” of federated SSO.  These files contain, among other things, the PEM-encoded X.509 certificates needed to establish digital trust in the messages exchanged via the SAML Web SSO protocol.   Of course this step is needed in order to tell the saml-login-server about the IdP, and establish the first leg of that digital trust.  You can move a copy of your local idp.xml file into the appropriate place in the saml-login-server deployment.  I simply replaced the sample idp.xml that was supplied in the /src/main/resources/security directory when I cloned the saml-login-server project from GitHub.

Once you’ve updated these items, do a maven build, and deploy the WAR to your servlet container.

3. Generate the Service Provider Metadata

The Cloud Foundry saml-login-server is playing the role of the Service Provider (a.k.a. the Relying Party).  We’ve already provided the saml-login-server with the metadata it needs in order to trust the IdP.  Now we’ll need to give the IdP a copy of the saml-login-server’s SAML metadata.  This step will make their digital trust relationship mutual.  Point your browser at the URL for the metadata generator.  In my deployment it looks like:

http://<hostname>:8080/saml-login-server/saml/metadata/alias/cloudfoundry-saml-login-server

Where the leaf part of the resource URL (the part after “alias”) corresponds to the unique name used for the login:entityID key within the login.yml file, above.  (BTW, this metadata generator resource is a nice convenience that is actually enabled via a bean provided by the underlying Spring Security SAML Extension).

Depending upon your browser, you may see the XML metadata rendered as a page or you may be prompted to save this file.  In any case, you’ll want to save the file to a meaningful name such as cf-login-server-sp-metadata.xml, as you’ll need to put copy of this file on the IdP machine next.

4.  Move the generated SP Metadata over to the IdP.

Using your favorite file transfer method, copy the SP Metadata file over to the IdP host.  In a typical installation it needs to be placed in the location <IDP_HOME>/metadata.

Make sure you update the <IDP_HOME>/conf/relying-party.xml file to include this new Service Provider and refer to the corresponding metadata file at the correct location.  My IdP happens to be running on a Windows server and the configuration looks like the following:

<metadata:MetadataProvider id="cloudfoundry-saml-login-server" xsi:type="metadata:FileBackedHTTPMetadataProvider"
     metadataURL="http://hostname:8080/saml-login-server/saml/metadata/alias/cloudfoundry-saml-login-server"
     backingFile="C:\shibboleth-identityprovider-2.3.5/metadata/cf-login-server-sp-metadata.xml">
</metadata:MetadataProvider>

5.  Release the appropriate NameID and attributes from the IdP

Finally, we need to tell the IdP that it is OK to release the users email as the Name Identitfier in the SAML assertion that it issues.  We’ll need to update two files in the IdP configuration.

First, you need to edit <IDP_HOME>/conf/attribute-resolver.xml and add the attribute definition for the users emailAddress as a nameID.  A snippet from my configuration is shown below:

    <!-- Name Identifier related attributes -->
    <resolver:AttributeDefinition id="transientId" xsi:type="ad:TransientId">
        <resolver:AttributeEncoder xsi:type="enc:SAML1StringNameIdentifier" nameFormat="urn:mace:shibboleth:1.0:nameIdentifier"/>
        <resolver:AttributeEncoder xsi:type="enc:SAML2StringNameID" nameFormat="urn:oasis:names:tc:SAML:2.0:nameid-format:transient"/>
    </resolver:AttributeDefinition>

    <resolver:AttributeDefinition xsi:type="ad:Simple" id="nameID_for_cf" sourceAttributeID="mail">    
        <resolver:Dependency ref="myLDAP" />
        <resolver:AttributeEncoder xsi:type="SAML2StringNameID" xmlns="urn:mace:shibboleth:2.0:attribute:encoder" nameFormat="urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress" />
    </resolver:AttributeDefinition>

Notice that here I’ve chosen to release two NameID attributes, both a transientId and also the emailAddress.  As noted above, we’ve chosen to configure the Cloud Foundry saml-login-server to request the emailAddress, so I’ve named that element appropriately, as per the XML attribute @id=”nameID_for_cf”.  Of course this is just an arbitrary name in the XML, and could have been be set to anything.  The ref=”myLDAP” must point to a corresponding DataConnector element, found in the same file, e.g.:

<resolver:DataConnector id="myLDAP" xsi:type="dc:LDAPDirectory"
   <...snip...>
</resolver:DataConnector>

And last, but not least, we need to authorize the release of the attribute with a permit rule in <IDP_HOME>/cong/attribute-filter.xml.  For the sake of simplicity in my PoC configuration, I allow the release of the emailAddress nameID to any SP.  In a real production configuration, you will likely want to be more restrictive, in accordance with your company privacy policy.

<!-- Release nameID_for_cf to enable SSO to Cloud Foundry -->

    <afp:AttributeFilterPolicy id="Cf_share_nameID">

        <afp:PolicyRequirementRule xsi:type="basic:ANY"/>

        <afp:AttributeRule attributeID="nameID_for_cf">
            <afp:PermitValueRule xsi:type="basic:ANY" />
        </afp:AttributeRule>

    </afp:AttributeFilterPolicy>

That’s It!

Now, go ahead and restart your IdP and the saml-login-server and check for a clean startup. If all goes well, you should now be able to log in using either your existing Cloud Foundry user name and password, or a SAML assertion from your IdP.

To test this you can choose the link on the saml-login-server login page that says “Sign in with your organization’s credentials.”  Clicking on that link should do an HTTP redirect, and after the dynamic discovery protocol completes, you’ll be looking at your Shibboleth IdP log in page.  After you fill in your credentials and post the Shibboleth log in form, you’ll be redirected back to the Cloud Foundry log in page, with an opportunity to view your account profile.  If you started the use case by initially visiting an application running within Cloud Foundry, you’ll be redirected back to the application page you requested, with your log in session established.  For my testing I found it convenient to use the example applications found in the UAA samples directory.

Oh, yeah, ….one last thing… A common problem that I’ve encountered in configuring SAML in other situations is that the time of day settings between the IdP and the SP must be synchronized to within a reasonable tolerance, e.g. a few seconds.  In production you are likely using NTP so this wouldn’t be a problem.  However, when doing this in a development environment don’t just assume that you machines have the correct time.   If you see an error indicating that your assertion has (immediately) expired, you’ll know why.