5th April 2010

Extending Xml: Two birds with one stone?

First, some background
Scenario A: Sitemap
Scenario B: Feed Aggregation
Defining your extensions
Conclusion

I remember once sitting in a classroom being told Xml is great because you can make up your own elements. But I didn't quite get it... Who or what would care about my made up names?

Well it turns out it's all rather simple. To me it also has the side effect of tying in nicely to a well known programming principle:

DRY : Don't Repeat Yourself

(and don't repeat anyone else for that matter - use what's already there!)

When developing an XML/XSL driven site such as cargowire.net you will be representing your own data as XML. However the things you are trying to represent are rarely entirely unique. There may already be an XML format for the type of information you wish to portray, or one that exists for a similar purpose to yours but with some aspect added or missing from your needs.

So why not bend those to your will rather than starting from scratch, and then use the same XML for multiple purposes. Your 'engine' can then save on transforms and reuse XML generation code (which can also mean reusing a single cache or file).

First, some background

The tried and tested example of namespacing and extending XML is that of HTML and XSL. Below is a snippet of HTML from cargowire.net that represents part of the header of the site.

          <html xmlns="http://www.w3.org/1999/xhtml">
            ...
            <div id="header">
                <h1 id="siteTitle">
                    <a accesskey="1" rel="home" title="Home" href="/">
                        <img width="334" height="110" alt="" src="/content/images/cargowire.png" id="logo" />
                        <span>Cargowire</span>
                    </a>
                </h1>
            </div>
            ...
          </html>

fig. 1.0

It is rather familiar HTML (or to be more specific, with it's well formed structure, XHTML). The all lower case angled bracket element layout is a form of XML, in this case, within the 'default namespace' of xhtml.

The xmlns attribute on the parent html element identifies all children elements as part of XHTML as specified by the w3c (with the known elements and attributes that that entails - as described at the namespace uri). If you followed the link on namespace defaulting above you'll know that namespaces can be mixed. And further, if you're a programmer this idea of namespacing to allow the use of different items with the same name will be very familiar to you.

To illustrate this, below is an excerpt from the cargowire.net XSL templates showing this kind of 'mixing':

            <?xml version="1.0" encoding="utf-8"?>
            <xsl:stylesheet version="2.0"
              xmlns="http://www.w3.org/1999/xhtml"
              xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
              xmlns:msxsl="urn:schemas-microsoft-com:xslt"
              xmlns:ms="urn:schemas-microsoft-com:xslt"
              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
                ...
                <!-- Elements within anything but the default namespace are prefixed -->
                <xsl:template match="cargowire">
                    ...
                    <div id="container">
                        <div id="content">
                            <h1 id="contentTitle">
                                <xsl:apply-templates select="page" mode="title"/>
                            </h1>
                            <xsl:apply-templates select="page" mode="content"/>
                        </div>
                    </div>
                </xsl:template>
            </xsl:stylesheet>

fig. 2.0

As you can see the XHTML namespace is specified as the default (line 3). Further namespaces are declared via multiple namespace attributes on the root stylesheet element (lines 4-7) with a colon separating 'xmlns' from the namespace prefix that will be used e.g. 'xmlns:xsl'.

In this way parsers can be sure of uniqueness when processing a particular xml document. In the XSL example above the processor identifies items with an xsl prefix as transform processing instructions whereas those with the default XHTML namespace are left to be written to the output directly.

This idea of mixing one set of known XML elements/attributes with another can be expanded to our own XML design. Here's a couple of examples from cargowire:

Scenario A: Sitemap

One of the obvious items of information required by a website is a menu structure. This could be represented in any number of ways. But before jumping in and creating a new nomenclature it's worth stopping to think what other purpose this xml could serve.

You will want to use your 'page structure' xml to present a menu to users but you may also want to present this information to other consumers. Sitemaps.org defines a standard for representing a website's sitemap that can be submitted to search engines. This can obviously also be used to represent your navigation. However in the case of cargowire I wanted to add a 'subtitle' to each menu item.

There was no scope for doing this within the sitemaps protocol, which left me with two options. One, that I create a bespoke xml schema for my sites purposes, with a transform to enable output in the known sitemap.xml format. The other is that I keep the sitemap.xml format and 'extend' it with my own elements/attributes.

Below is the 'extended' version:

        <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
          xmlns:cargowire-sitemap="http://cargowire.net/cargowire-sitemap"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9  http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd  http://cargowire.net/cargowire-sitemap  http://cargowire.net/cargowire-sitemap.xsd"
          >
            <url>
                <loc>http://cargowire.net/articles</loc>
                <lastmod>2009-08-29</lastmod>
                <changefreq>weekly</changefreq>
                <priority>0.5</priority>
                <!-- My added content -->
                <cargowire-sitemap:accesskey>2</cargowire-sitemap:accesskey>
                <cargowire-sitemap:pathandquery>/articles</cargowire-sitemap:pathandquery>
                <cargowire-sitemap:title>Articles</cargowire-sitemap:title>
                <cargowire-sitemap:subtitle>Things I think about</cargowire-sitemap:subtitle>
            </url>
            ...
          </urlset>

fig. 3.0

This can then be accessed in the XSL by referencing the namespace. Both in the root stylesheet element and then when matching the particular elements:

      
          <xsl:stylesheet ...
            xmlns:s="http://www.sitemaps.org/schemas/sitemap/0.9"
            xmlns:cargowire-sitemap="http://cargowire.net/cargowire-sitemap">

          <xsl:template match="s:urlset" mode="nav">
            <dl id="pNav">
              <xsl:apply-templates select="./s:url[cargowire-sitemap:pathandquery/@showinnav = 1]" mode="nav"/>
            </dl>
          </xsl:template>

fig. 4.0

This has saved on having an additional transform whilst allowing me to ensure that only parsers expecting/looking for the cargowire namespace are aware of it.

Scenario B: Feed Aggregation

TheBarn section of this site pulls in feed contents from a variety of sources (as discussed previously, using yahoo pipes). This content adheres to the RSS xml spec and contains all the information I need to display a listing except for one thing. On my barn page I also display a small avatar of the author.

Following the example above a gravatar element can be added to the 'item' element of the rss feed:

           <item>
              ...
              <cargowire:avatar url="http://www.gravatar.com/avatar.php?gravatar_id=f953df82b2a3f65e0acb9b3897908be0&amp;rating=G&amp;size=40" />
           </item>

fig. 5.0

Doing this with Yahoo Pipes

Unfortunately at time of creation I was unable to do this addition within the Yahoo pipes interface i.e. specify a namespace in the root element output to allow the avatar element within the items to persist (see some forum posts regarding it here). So currently I, as the consumer of pipes content, have to add any custom information I want at my end.

Defining your extensions

If you are using a tool like Visual Studio you may be able to validate and/or receive intellisense (autocomplete) on your XML if you specify an XSD. An XSD can also be used to validate an XML document ensuring that a document will not cause any suprises for any consuming code.

          <xs:schema id="cargowire-sitemap" xmlns:tns="http://cargowire.net/cargowire-sitemap" 
             attributeFormDefault="unqualified" elementFormDefault="qualified"
             targetNamespace="http://cargowire.net/cargowire-sitemap"
             xmlns:xs="http://www.w3.org/2001/XMLSchema">

            <xs:element name="pathandquery">
              <xs:complexType>
                <xs:simpleContent>
                  <xs:extension base="xs:string">
                    <xs:attribute name="rel" type="xs:string" use="optional" />
                    <xs:attribute name="showinnav" type="xs:boolean" use="optional" />
                    <xs:attribute name="accesskey" type="xs:string" use="optional"/>
                  </xs:extension>
                </xs:simpleContent>
              </xs:complexType>
            </xs:element>
            <xs:element name="title" type="xs:string" />
            <xs:element name="subtitle" type="xs:string" />

          </xs:schema>

fig. 6.0

A namespace uri will often lead to a page describing the namespace and include links to the schema definitions (see the XML Schema namespace uri itself for an example).

The schemaLocation attribute (as seen in the sitemaps.org example in fig 3.0) can be used to identify the XSD location for a particular namespace. The attribute value will be a list of key value pairs. The key being the namespace uri and the value being the schema location (use xsi:noNamespaceSchemaLocation for the default namespace).

Although XSD's are not the focus of this article it is worth knowing how to read these attributes when investigating XML schemas published by others.

Conclusion

Extending XML can be an extremely flexible way of addressing multiple problems within one Xml document. Doing so acts an extension of good programming practice - before starting, look to build on what already exists.