022 34168 info@spiralli.ie

Is XML the best we can do?

Jan 22, 2011 | Web development

I’ve been working on a web app lately that involves pulling in XML data via an API through PHP. I’ve come to the conclusion that XML is a pig to work with. First of all, the format sucks. It’s neither fish nor foul – not particularly human-readable and also difficult to parse with PHP. The whole concept of attributes is poorly thought out – whether a datum deserves it’s own tag or should be an attribute of a tag seems to be an entirely arbitrary decision. There are a number of ways to tackle the parsing job in PHP, including SimpleXML, XMLreader and some third party classes. I ended up using the  built-in DOMDocument class as the best of a bad lot.

Then the fun started… DOMDocument seems to be very computationally expensive. My task involved reading in a small-ish number of XML pages via the api, stripping and sanitizing relevant data, and storing it in a database. This task would be performed daily with a cron job, to keep the database up-to-date. Despite optimising my code in every way possible, this simple code was taking an age to run. I’ve not profiled it yet, but I’m sure the DOM traversal is the bottleneck.

I had more problems with the API, to do with UTF-8 encoding and url encoding, and debugging wasn’t made any simpler by my IDE’s (Eclipse PDT with xDebug) inability to see inside the DOMDocument. I ended up having to use the class’s saveXML method just to look inside the data so I could debug the app.

JSON looks a little better, and now that jQuery has made client side javascript sexy and node.js has made the language a viable proposition on the server side, the format is growing. It’s still a nasty format though.

Platform and language independence are hugely important for a universal data format. If you’re serving data via the web, you never know if your client will be a Linux host querying with ruby or an AS/400 using COBOL to ask for data. Remembering the horror of interfacing with CORBA technologies in the past, I can understand why people find the simplicity of XML attractive. I wonder why we can’t use a relational database querying system. In fact, this model is already in use, with widespread adoption by the serial innovators at Yahoo. Their YQL query language presents a wide range of common web data sources ready to be queried.


Pic by Kassel1

What about human readability? Not a problem really. RSS readers today assemble a raw XML feed into a nice list of magazine articles, because they know the XML structure of RSS feeds. We already have the technology to abstract relational querying language into a GUI tool that makes sense to humans, and giving us the ability to construct a personalised superfeed aggregated from multiple sources with filters to remove what we don’t want.

So I’m calling it – XML has served it’s purpose. We need a new universal data model, and it should be relational (and yes – I’m aware of the noSQL movement).

In the land of the blind, the one-eyed man is king

In the land of the blind, the one-eyed man is king

Politics is a tough job, and politicians are a strange bunch. Douglas Adams put it best: To summarize: it is a well-known fact that those people who must want to rule people are, ipso facto, those least suited to do it. To summarize the summary: anyone who is capable... read more
Great Deal From WPEngine

Great Deal From WPEngine

WPEngine, the market leaders in managed WordPress hosting, have sent me early access to a great deal. You can get 4 months free with code independence2017 ht.ly/8smv30d7rGF Why choose them? Great support Daily backups Firewall Threat detection and blocking Proprietary... read more