The Monday night meet-up was timed with the release of the municipal election candidate financial disclosures by the City of Hamilton. The objective: to lay the open data foundation for the 2010 Candidate financial statements, which outline how much candidates spent on their campaign, and where this money came from.
Municipal candidate financial statement auditor's report
For about five hours, we hacked away to transform the PDF scans of the 9503P "Financial Statement - Auditor's Report" form pulled from the city's website into a more usable dataset.
I had my reservations about an "evening hackathon" type event, since we had just a few hours to put together something useful, but we managed!
In my humble opinion, the following three considerations made it a success:
Attendees at the Open Hamilton hackfest
Open Hamilton team planned the hackfest at an easy-to-get-to location, right off the highway (and by Timmy's where some of us fueled up Smile). Not to mention it was a super cool place: Think|haus, home of Hamilton's "open data" and local hackerspace.
Our host Richard Degelder, whose beard rivals that of Richard Stallman, got us access to an open dedicated wifi network with plenty of bandwidth & all the necessary ports for FTP, SSH, Remote Desktop, and so on.
The whiteboard and projector helped organize our thoughts, a shared Google Doc helped aggregate links and coordinate our efforts. (In the past we've used Wikis like PBworks that work well, but it's nice to be able to collaboratively edit the doc. If you prefer the look/feel of Excel, you can use Office Live for free to edit simultaneously as well.)
Open Hamilton Election Financials - Total Contributions by Ward
It is tempting to aim really high and set lofty goals for a hackfest: "we want the most comprehensive dataset", or "we'll build the coolest app that does everything you can possibly imagine". Don't do it!
Set a modest goal to end up with a "basic dataset X" or a "basic functionality Y" in the app you're hacking. Then you'll be happy to wrap up your hackathon with something to show for it.
On Monday we decided to "scrape" only a small number of data fields, thirteen to be exact, pulled from the PDF forms, and to load the data into the OGDI catalogue that stores data for DataDOTgc.ca so that we can visualize it.
Hamilton ward boundaries
We not only knew what we wanted to do, also but had a pretty good idea of how we would do it. Prior to the hackfest, there was a planning thread to discuss the project and the tools that are needed.
Going in, we knew we'd be extracting data into a simple format (CSV) and loading it into a data store. We also discussed the scope of the effort, and that we'd eventually want to mash-up some of the data with ward maps.
This meant we needed Hamilton Ward data in a format like KML, so Joey Coleman and Richard did some pre-work to get the Ward boundaries data prepared ahead of time.
We ended up with a clean version of the Hamilton Ward Boundaries KML, and because we planned for it ahead of time, we included information like the Ward ID in version 1 of the spending dataset that could easily be mashed up with the Ward KML Layer. So, a bit of pre-planning up-front saved us a bit of aggravation in the end.
To take PDF data into a spreadsheet or database and then a cloud catalogue doesn't seem like a big deal on its own.
But to orchestrate half a dozen or more people with various skills and technology experiences requires some balancing of short-term objectives of the hackathon with long-term goals of the local open data movement.
first published on Open Halton