Hamilton should embrace an open, shared data platform that increases participation and transparency while delivering better, more usable information to the public.
By Ryan McGreal
Published June 05, 2009
It's time for the City of Hamilton to start publishing its publicly available data in an open, accessible format instead of today's hodgepodge of closed, clunky and idiosyncratic legacy formats that are hard to find and even harder to use.
Have you ever tried to find, well, anything on the City of Hamilton website? The design is so broken it almost seems as though it were intended specifically to discourage any citizen who dares to try and look something up.
Both the site search engine and labyrinthine, deeply nested hierarchy of departments and offices are painful to use, but the search function warrants special mention for its sheer inability to deliver results that match what you're trying to find. No matter the search terms, it invariably dumps out an eyeball-searing block of links to committee meeting agendas instead of the official page that explains the policy, displays the results or links to the appropriate forms.
A Google site-specific search generally does a better job of finding relevant links than the city's own search engine, which brings me to my thesis: Hamilton should embrace an open, shared data platform that increases participation and transparency while delivering better, more usable information to the public.
Here's a case study to illustrate what I have in mind. Every month, a city employee sends out a summary report of monthly building permit activity to an email distribution list. The report includes a summary table of permits issued and total building value by category (residential, commercial, industrial, institutional and miscellaneous) and sub-category (e.g. "New one- and two-family dwellings", "Alteration to one- and two-family dwellings").
Significantly, it is published as a PDF, a format that is designed to preserve document design elements in print.
Unfortunately, PDF is awkward to access electronically. You generally need to fire up a separate PDF reader, like Foxit Reader for Windows or Evince for Linux, just to open the file. It's cumbersome to highlight and copy text (and many PDFs are next to impossible to copy) and very difficult to preserve tabular formatting.
If you want to create or modify a PDF, it's even more difficult. There are software applications (both proprietary and open source) to do this, but they require specialized, application specific knowledge and lots of manual intervention.
So when I decided to create a web-based summary graph of building activity for the past five years, I had download five years' worth of Building Activity reports (all PDF documents) one-by-one, open each one manually, scroll down to the data page and type out the values into a separate document.
Then I uploaded my data table into a database (where this kind of data belongs) and wrote a quick web script to display the data dynamically in an HTML-based report:
I know this report isn't fancy, but it is accessible to anyone with a browser (and it updates automatically when I insert the next month's data). Even if you are visually impaired, your browser's screen reader can read the numeric data table.
More importantly, it is also available in a machine accessible format. That means search engines can find and index it, and computer programs can parse the data automatically so third parties can easily create their own reports.
Now, this report is just a proof of concept, and it would certainly benefit from the ability to drill down to a single month to see the activities by category and subcategory. My point is that if the city made its public data available in a web-based application programming interface (API), I could spend my time building reports on the data rather than spending it having to extract the data manually from inaccessible print formats.
Incidentally, it took at least twice as long to get the building activity data for this report as it actually took to write the code that produces the report. Worse, someone at the city actually performed extra work to present the data in a format that required me to perform extra work to get the data back out again: a compounded diseconomy.
An additional problem is the fact that, since I'm manually copying data from one format to another, that introduces two opportunities for errors to creep into the report: typos and copying errors; and forgetting to do the manual update.
An open API would eliminate these problems, in addition to dramatically reducing the amount of time and effort involved in finding, analyzing and presenting city data.
The sample report I wrote above is presented in HTML, which is a simple, open standard for data presentation that any browser can read and display. Most programming languages include HTML parsing functions, so it's machine-readable as well as human-readable.
The problem with HTML is that its syntax was designed for documents (with headings, subheadings, paragraphs, lists, and so on) and not other structured collections of data objects.
It would be even more useful to third parties if it were in a more structured data format, like JSON or XML. I much prefer JSON to XML because it's a) far less verbose and, more important, b) human-readable as well as machine-readable.
Like HTML, these formats are based on open standards and can be parsed by a variety of different programming languages. They don't lock analysts or report developers into a particular language or platform.
There are a lot of developers and hackers - i.e. programmers and computer hobbyists who like to tinker and who enjoy exploring computer systems and solving problems - in Hamilton and beyond. That community is going to be smarter and more creative at digging into the city's public data and finding innovative ways to combine and interact with it than any one person or formal organization.
Any individual contribution to such a collaborative community endeavour must necessarily be modest, if only because the collaborative platform itself allows for a distributed, iterative approach to application development that beggars the creative potential of its participants in isolation.
Consider the Linux operating system and its vast ecology of distributions and free and open source applications, none of which would be possible if not for the fact that both the philosophy of open sharing and the collaborative platform of the internet are so effective at combining and aggregating the individual efforts - most of them quite modest on their own - of large groups of people.
Linux machines serve the majority of websites on the internet (and open source web servers, web programming languages and database servers manage most of its content). They are incorporated into computers ranging from large, industrial strength servers right down to compact hand-held devices and everything in between.
We already have an open source success story close to home: Hamilton entrepreneur Bob Young, owner of the Ti-Cats, made the fortune he has been willing to sink into that franchise by giving away a free, community-developed distribution of Linux called Red Hat and selling customer support and quality assurance guarantees.
By making the city website into an open platform, the city could trigger a real renaissance in citizen engagement and increased transparency. Tim O'Reilly, the founder of O'Reilly Media and a long-standing proponent of free and open source software, recently made a presentation on Government-as-Platform, in which he argued that successful technology platforms:
This may be O'Reilly's most important observation:
If you're really building a platform, your customers and partners build new features before you do.
My own contribution to such an ecology of open government applications must necessarily be modest, not only because of the nature of open source development but also because there are a lot of programmers out there who are a lot smarter than me. :) Nevertheless, here's one obvious idea that shouldn't be hard to implement: a live HSR bus map.
The city is purchasing new GPS systems for all its buses and will have them installed by the end of the year. If the city provides real-time GPS data for its bus fleet in a web-based API, someone can create a Google Maps mashup that places the buses on the map in their actual current position and lets users click on a bus to see its identity, route and schedule.
Again, the city could hire a consultant to produce such a tool. It would probably end up costing a lot of money, not working very well and having extremely limited responsiveness to the feedback of the user base.
Why pay for second-rate software when the city can encourage its own citizens and residents to create and improve a similar application simply out of a joy for creation, sharing and participation in the public weal?
Open Government is an idea whose time has come.
The basket of related technologies is mature enough to support a robust government-as-platform API for public data. The open source community development model is already proven to work. Switching to open source software and turning data analysis into a community project would actually reduce the city's expenses while providing better, easier to find information and greater transparency and accountability.
One of US President Barack Obama's first acts in office was to appoint a Chief Technology Officer to move the government to a working model in which the business of government is transparent, participatory and collaborative.
A major inspiration for the US government's commitment to openness is the Eight Principles of Open Government Data developed by the Open Government Working Group. I'll close this essay with the list of eight principles:
Government data shall be considered open if it is made public in a way that complies with the principles below:
- Complete: All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.
- Primary: Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.
- Timely: Data is made available as quickly as necessary to preserve the value of the data.
- Accessible: Data is available to the widest range of users for the widest range of purposes.
- Machine processable: Data is reasonably structured to allow automated processing.
- Non-discriminatory: Data is available to anyone, with no requirement of registration.
- Non-proprietary: Data is available in a format over which no entity has exclusive control.
- License-free: Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.