Schema Mapping Session at IIW
I led a session about schema mapping at IIW last week. The basic idea is this. Rather than trying to get the world to agree to a single schema for attributes (e.g. OpenID AX, ICF Schema Catalog, Plaxo Portable Contacts, etc., etc., …you know the old saw that the great thing about standard is that there are so many of them (like 75!)) we just let the natural authorities for attributes mint their own URIs.
And while we’re being lazy, we just sit back and watch as these schema-creators evangelize their particular schema as far and as wide as they wish to. Today the only way an IdP can talk to an RP is if both know how to speak a common schema. This is true regardless of protocol or transport. It is as true of SAML tokens as OpenID attributes.
Its all a form of tight coupling. And tight coupling requires a lot of effort. You know what they say “consensus is harder than code.” Experience shows that the richer the schema the higher the costs to get everyone on board, the longer the process takes, and the narrower the diffusion/adoption. These economic realities drive the creation of more and newer schemas in each sub-ecosystem, even when common schemas could theoretically be agreed to.
But if we can’t all agree to the “one schema to rule them all” aren’t we doomed to a Tower of Babel?

Not entirely. There is another possible route to interoperability. Mapping. Instead of creating N*N mappings between each schema we create 2N mappings into and out from a common, rich, granualr, and horribly complicated schema (that nobody would use directly).
We use a mechanical process (think web service, library, etc.) that maps an input schema into a rich, intermediate schema, and from there to an output schema. This schema mapping process, being both algorithmic and data driven, can live at the RP, in the cloud, or at the IdP, depending on the need.
I will now describe one way to do this schema mapping. I have a personal bias towards declarative approaches that involve rich data and simple algorithms. The mapping rules that I’m about to describe can themselves be described as data with embedded names of a few simple functions. So that’s the design approach. Here are the details.
Every input attribute must come from some known namespace (schema name). A set of mapping rules must have already been created; one for each attribute in the input schema. The rule for the specific input attribute is then looked up and applied to transform this input attribute into its equivalent attribute(s) in the internal, intermediate data model (schema). To create the output attribute(s) the process is reversed. The target namespace (schema name) must be known, and a set of mapping rules must have been created for it. The output process takes the attribute in the internal data model, looks up the mapping rule for it and uses this rule to generate the output attribute.
This approach was discussed a lot on the second day of the recent Tao of Attributes workshop, and a some similar thinking was discussed a couple years ago regarding a Common Dictionary Service (CDS) on the IdentitySchemas.org list at Identity Commons
The Higgins project is starting work on an open source Persona Data Model that could serve as a common internal schema. A schema that nobody would actually use per se, but useful to map into and out from. We’re also experimenting with declarative mapping rules.
A quick aside:
The straw that broke the camel’s back for me happened recently. In the ICF’s Schema Working Group, we created a super-lightweight, email-based process to simply list whatever attribute/claim URIs that any party reasonably suggested they wanted. Here’s the catalog we created. When Equifax wanted an “I’m over 18″ URI we swung into action and minted http://schemas.informationcard.net/@ics/age-18-or-over/2008-11. Cool.
Then the ICF and OpenID foundations start working together with the GSA and other parts of the Federal government. There’s a need for a “Level of Assurance” 1 claim. No problem. We created http://schemas.informationcard.net/@ics/icam-assurance-level-1/2009-06. Trouble is, when the GSA’s profile for IMI Infocards was published the URI started with http://idmanagement.gov.
Why? Who knows. That’s what they wanted. And since (sadly) in SAML there are no sub-namespaces allowed with the URI namespace, one URI is as good as another since all must be treated as an opaque string. So it’s hard to push back on the “customer” and tell them that the attribute should really start off http://schemas.informationcard.net… They think that the LOA 1 URI is theirs. To make a separate URI and thus define another schema over such a trifling matter, was all the convincing that I needed to rethink things.
3 Comments »
RSS feed for comments on this post. TrackBack URI
Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
There’s some great tips there, just blogged about it too!
[Reply]
Comment by Usdating — January 10, 2010 @ 6:13 pm
I am disappointed that the idea expressed here that makes so much sense incurred no comments in the 90 days since Paul wrote this. We should just do this. I have a business that needs to interoperate with multiple identity protocols. How can we help start the ball rolling? JanRain? Ping? Paypal? What do you think?
[Reply]
Comment by Charles Andres — February 11, 2010 @ 9:09 pm
[...] call (e.g. getAttribute) may be in its own or any other vocabulary. As I’ve mentioned in my schema mapping post, we follow the philosophy of mapping into and out from a normalized internal schema. To encourage [...]
Pingback by In Context » Apps and Personal Data Stores — March 22, 2010 @ 5:09 pm