Discussion and debate over the opportunities and challenges of open public debate have moved from the margins of political discourse in Hamilton toward the mainstream.
By Ryan McGreal
Published March 18, 2011
In a sign that the concept of open public data has moved from the margins toward the mainstream, local media pundits in Hamilton have started debating it on cable TV, social media and blogs.
Open public data refers to government data that has been defined as public information and made available in open, standardized formats that are findable, accessible and machine-readable so it can be processed by third-party software applications to create resources.
The purpose is to enable citizens to find, analyze, and do useful things with public information. Software written once to provide a service can then do its job over and over again, improving service delivery while saving the city resources.
Open public data is about making public data more accessible, not making private or confidential data public. Data that has privacy implications deserves to remain private.
Worldwide, a number of governments at the federal, provincial/state and municipal level have already made formal commitments to open public data and are at various stages of creating and expanding public data repositories.
The Government of Canada just announced yesterday that it has launched a federal open data pilot project with a catalogue of 782 data sets. According to the site, the goal of the project is "to improve the ability of the public to find, download and use Government of Canada data".
An early review of the federal open data project by David Eaves suggests that the project has real potential for growth and development but that the licence under which the data has been shared is far more restrictive than that of other governments.
At the municipal level, Hamilton City Manager Chris Murray recently sent a memo to council in which he called open data "part of a transformation agenda" that will require "strong leadership and a culture change" in how the city manages data and deploys resources.
According to Murray, open public data has the potential to increase transparency, improve service delivery and engage the public more actively.
Murray added, "Information is traditionally seen as power, and opening it up means sending a message within the City administration that this is the public's information and not ours to control."
In a March 8, 2011 discussion item on Cable 14's On The Record, host Larry Di Ianni raised some concerns about open public data (see video below, starting at 19:02), complaining that advocates want open data "for reasons not yet invented" and "cannot articulate a coherent reason for supporting the advocacy."
Di Ianni argued that open data can leave citizens "more confused" than informed, that "opening up data is expensive" and that some advocates "want to use the data as weapons to hold the system accountable".
He concluded, "a rationale for opening up the data is a necessary precursor to the data overload."
Opinionator Laura Babcock retorted, "That's total nonsense and completely insulting, Larry. I can't believe your bias has seeped in so far to your commentary," to laughter by fellow panelists Loren Lieberman and Danya Scime.
She called open data "a no-brainer" and then articulated a terse rationale: "This is information that informs us as a citizenry. I'm not going to get 'confused' by it. I can think of many applications: when students are doing research, when you're doing civic planning, when you're part of citizens' groups, when you just want to be informed about your democracy."
She concluded, "Sorry, Larry, I don't know who's been articulating it incorrectly to you, but your emphasis on the word "activist" suggests your bias from the beginning."
Lieberman expressed concern that "some subversive media or otherwise might be spooning us incorrect findings based on a skewed look at how things are."
Scime argued, "We want the accountability, we want the transparency" and expressed hope that people use the data to get educated and informed.
The debate continued after the show ended. Babcock tweeted that she "kicked Larry Di Ianni's proverbial butt", triggering some back-and-forth on the matter. This week, she posted an entry on Powergroup's "As Seen On TV" blog, asking: "should Hamilton have open source data? Why or why not?"
On March 14, Cal DiFalco wrote a blog entry on OpenFile Hamilton in which he expressed support for open data but argued that Di Ianni and Babcock both raised valid points.
DiFalco warned, "Having open access to data and being able to interpret it correctly are two different things." He raised the prospect of "faulty analysis" that could "reach a large audience" and "do more harm than good".
He added: "in the hands of someone with more sinister intentions, data can be 'spun,' shaped or used selectively in such a way that it can, in fact, be used maliciously, or even as a political weapon".
In an email response to RTH, Di Ianni clarified, "I never expressed any concern with open data. In fact I said it was inevitable."
He repeated his three specific concerns: that advocates have not done a good enough job of articulating the case for open data; that some advocates want to use open data to go on the political attack; and that the data can be confusing to the public.
He added, "Accountability is very good, but political witch-hunting is not."
Laura Babcock acknowledged in an email to RTH that Di Ianni's argument "has some merit as it is true that all of the future uses of open data are unknown. However, that concern should not prevent open data (as I articulated on the show)."
She added that people have always tried to manipulate information for narrow political or partisan purposes, but that "politicians should prepare for a new level of transparency" in response.
She concluded, "Access to more information in a democracy can result in a more engaged, better informed electorate".
As an early proponent of open public data, I take Di Ianni's objections seriously and am working on a more formal argument in defence of open public data that addresses his concerns.
I will be making a public presentation on understanding open public data this coming Wednesday evening at a talk organized by the Canadian Information Procession Society - Golden Horseshoe. If you are interested in attending, please RSVP to the organizers as space is limited.
Over the past few years, I've heard from a number of City employees who are frustrated that they waste enormous amounts of time trying to get their hands on data from other parts of the city. There's very little sharing between departments, and each silo operates as a gatekeeper to the data it produces.
An open public data policy would save the city money, in at least three ways:
Employees will spend less time and money converting data from accessible formats to inaccessible formats.
Employees will spend less time and money trying to obtain data from other departments.
Employees will spend less time and money converting data from inaccessible formats back into accessible formats.
If the data is released in an open, accessible format from the beginning, we enjoy a triple boost in employee productivity as well as a boost in the capacity for citizens to participate meaningfully in the process of governance.
To be sure, some parts of the city are trying to move in the right direction. The Public Works Department has gotten a lot better at internal team-building and cross-departmental dialogue, and City Manager Chris Murray is trying to change the culture from the top down (hence his support for open public data). But it's still very much an uphill struggle.
In some cases there may be some upfront costs associated with making public data available in usable formats, but this should not be a barrier to embracing a commitment to open data. The development and release of public data in open formats can be a progressive, incremental undertaking - it doesn't all have to happen at once.
As always, there will be opportunities to pluck low-lying fruit by starting with data that is easiest to convert.
Remember: time invested once to make data accessible is amortized by delivering savings and economies over and over again in every subsequent use.
The relative absence of such data doesn't seem to have stopped the attack-minded. I'd argue that such attacks are easier in the absence of data, because insinuation and accusation flow in to fill the void.
Good hard data should actually reduce the effectiveness of such attacks by countering emotion with information. Public discourse leavened by solid data can waste less time turning around uninformed opinions and reasoning by fear and ignorance.
I come from a background in open source computer programming, which shares many of the same principles as open data. It operates through open public discourse informed by reproducible results - essentially, the kind of peer review that drives science forward and moves a community of learners away from theories and models that are provably wrong.
Di Iannni acknowledged in his email to me that his concern about the potential uses of open public data was partially in response to a statement I made in an interview with the Spectator's Bill Dunphy:
A programmer by occupation, McGreal says he's not sure what people would do with the kinds of data the city and its institutions could release (data on road conditions, transit location, trash collection schedules, public health, crime figures), "but I see untapped potential in it all. I'm a believer in the value of serendipity. I won’t know how to use the data until I can see it."
I realize that's a tough sell, so here is a concrete example of the use of open public data.
In 2007, I read a Globe and Mail article about development in Hamilton that made some suspicious-sounding claims about increased building permits after the city lowered business property taxes.
I tried to get my hands on the building permit data - building type, property valuation, location - and was told I couldn't have it. The closest I could get was a page of links to a stack of summary reports - in read-only PDF - that provide permit totals by facility type.
I had to spend several hours opening each PDF (one for each month) and manually copying the summary figures into a database, which I then spent several more hours converting back into a format that was accessible and sharable.
With this partial information, I was able to determine that most of the building permit value was residential and not industrial or commercial, suggesting that lower business property taxes were not responsible for the boom in permits.
Because I didn't want my efforts to be wasted on a one-shot report, I later released the data in an open, accessible format and have since maintained it as monthly building permit reports (still PDF) are released.
At the end of the year last year, we were able to determine easily that the city had exceeded $1 billion in overall building permits for 2010. We were also able to determine that more than half of the total is still residential, an important context that casts the achievement in more accurate light.
If we had more detailed, low-level data on building permits, we would be able to provide an even better understanding of what the data mean from an economic and policy perspective.
But the reason I keep coming back to "serendipity" is that open public data is ultimately as much about creating new resources as it is about improving existing ones.
Open public data opens up the imagination and unlocks creativity. When advocates say we don't know all the potential uses, that is not a strike against open public data. It is an acknowledgment that open public data make transformative innovations possible.
Here is a closing case in point: the Japan Quake Map, which combines quake data feeds from the United States Geological Survey with Google Maps to create a time-lapse animation showing the location, magnitude and reach of the earthquakes that struck Japan starting last Friday, March 11.
Such creative visualizations can take large, unwieldy and perhaps "confusing" data sets and make them accessible and meaningful to everyone.