Opening up Virginia campaign finance data with Saberva.
With the Virginia State Board of Elections starting to provide bulk campaign finance data, a whole new world of data has opened up, and I intend to make the most of it.
Although the esteemed Virginia Public Access Project has long provided this information (laboriously cleaned up and displayed in a user-friendly fashion), it’s useful only to end users. There’s no API, no bulk downloads, etc., so it’s not possible for that data to be incorporated into Richmond Sunlight, Virginia Decoded, iOS apps, etc. That’s not a knock on VPAP—their line of business is providing this information to end users, period.
My normal instinct is to create a website that gathers and displays this data and, by the way, provides bulk downloads and an API. (For example, see Richmond Sunlight’s API guide and downloads directory, or Virginia Decoded’s downloads directory (the API is in alpha testing now).) But the website is, in this instance, unnecessary. VPAP is doing a better job of that than I can.
Instead, I intend to provide the tools for others to use this data. To that end, I’m developing Saberva, currently hosted on GitHub, a parser that gathers the data from the SBE’s servers, cleans it up, and exports it all to a MySQL database. (“Saber” as in Spanish for “to know,” and “VA” as in Virginia.) At first it’ll just be a program that anybody can run to get a big, beautiful pile of data, but I intend to provide bulk downloads (as MySQL and CSV) and an API (probably just as JSON). Slowing things down somewhat is the fact that I’m writing this in Python, a programming language that I know well enough to muck around in other people’s code, but not nearly well enough to write something of my own from scratch. This seems like the chance to learn it, and I think that Python is the right language for this project.
Awkwardly (for me), I’m learning this new language out in the open, on GitHub. GitHub, for those non-programmers, is a source code sharing website, for folks who, like me, develop software collaboratively. Every change that I make—every new line of code, every mistake—is chronicled on the project’s GitHub page. The tradeoff is that others can contribute to my code, making improvements or correcting my errors. Open government hacker Derek Willis has already forked Saberva, replacing and improving my laborious CSV parsing processes with Christopher Groskopf’s excellent csvkit.
Right now, Saberva will download the data for a single month (April), clean it up a bit, save a new CSV file, and create a file to allow it to be imported into a MySQL database. I’ve got the framework for something useful, and now it remains to be made genuinely useful.
If you’re handy with Python, and you know your way around Git, I hope you’ll consider lending a hand, even just cleaning up a few lines of code or adding a bit more functionality. Lord knows I could use the help.
10 Comments