Creating an API for the commonwealth.

In perhaps a misguided effort, I started work a few months ago to create a site to round up all of the APIs for government data in Virginia. That inglorious website is Open Virginia. It’s easy to catalog all of them, because right now, I know of exactly two: Richmond Sunlight’s legislative API and my new API for state court decisions. If the state maintains a single API, I don’t know about it.

(Don’t understand this “API” business? Application programming interfaces (APIs) are how software talks to software, and they’re the underpinning of the modern web. For instance, post a comment here, WordPress submits your comment to Akismet, Akismet evaluates the comment to determine if it looks like spam, and returns a score that WordPress uses to decide whether your comment is published or held for me to review. And Richmond Sunlight’s API allows other websites to automatically retrieve information about legislators, the status of legislation, Photosynthesis portfolios, etc.)

So, about that API for state court decisions. Right now it does just one thing: when given a section of the state code, it returns a listing of all published Court of Appeals decisions since May 2, 1995. (Note that it’s liable to be particularly useful when combined with with the Richmond Sunlight API call to retrieve a list of bills that affect a given section of the state code.) Soon enough that’ll include Supreme Court of Virginia rulings. There are all sorts of obvious additional functions that I intend to add soon enough, such as returning information on given decisions, returning a list of cases that match a given keyword, returning a list of cases for a given period of time, etc.

What good does this do you right now? In all likelihood, none at all, unless you know what to do with and have some use for, say, JSON data for cases concerning § 20-107.3. It’s not until this API is pressed into service on other websites that it’ll be of value to virtually anybody. Maybe there’s no audience for this—maybe there are no programmers just waiting for some shiny new APIs, chock full of state government data. But we won’t know unless we create some APIs, will we?

Follow Open Virginia on Twitter to keep up with the opening up of Virginia through the magic of APIs.

Published by Waldo Jaquith

Waldo Jaquith (JAKE-with) is an open government technologist who lives near Char­lottes­­ville, VA, USA. more »

6 replies on “Creating an API for the commonwealth.”

  1. What we’re really talking about is 3rd party apps that can access state data and state databases – right?

    if not, can you further elaborate?

    If a 3rd party could write apps for say Commonwealth Datapoint or the database that the Auditor of Public accounts has that is used to generate the large and complex spreadsheet of comparative data for local governments – it woud be a boon to those of us who have to put up with their clunky interfaces but if they did that – it would probably be viewed as a threat to those performing those jobs for the state.

    What forms of state data would you consider as candidates for APIs?

  2. You’ve got it, Larry. In practice, that means taking big datasets that can’t be accessed directly by software and loading them into a database that can be accessed via an API. Also, writing a script to update that database every time a new dataset is made available. I’d consider any state data a candidate for an API, although I’ve got my own concepts of what’s important: election data, campaign finance date, the state code, the granting of permits by all agencies, and the issuing of citations by all agencies all come to mind readily.

  3. Congrats on your AWARD!

    getting access to state maintained databases… how?

    FOIA won’t do it, right?

    would legislation be required?

    Of the two you have done – how did you gain access?

    Are there existing State or Federal precedents or MOAs?

  4. Ok I read Wiki on API and kinda of understand API’s. Is there somewhere else that you might suggest that I might go to get a better clue. Whenever someone is fusing tech with politics, it really helps to undestand what it is before I can understand why it’s important. I find without that knowledge that someone who against it can give me some sort of “fatal flaw” reason why it a very bad idea. That case is usually wildly overstated but stops the debate.

    For example: is it reasonable to claim that API’s help explore data that the public has a right to know and there is no reason that access could compromise security of those databases. “It’s the people’s data and we have already paid for it, so show us what you got”

  5. being the cynic I am – I’d not be surprised to hear state agencies make the case that if ANYTHING in the dataset cannot be released to the public that NONE of the dataset can be released because sooner or later some clever guy will figure out how to get it.

    Somehow Waldo was able to convince the General Assembly folks to allow him access to either real-time or near-real-time data and I suspect however they accomplished that would be a viable model for state agencies.

    Kind of hard for a state agency to argue that it can’t do what the GA or the courts did, eh?

  6. Is there somewhere else that you might suggest that I might go to get a better clue. Whenever someone is fusing tech with politics, it really helps to undestand what it is before I can understand why it’s important. I find without that knowledge that someone who against it can give me some sort of “fatal flaw” reason why it a very bad idea. That case is usually wildly overstated but stops the debate.

    You might take a browse through Data.gov. That’s got something like 180,000 data sets there (if I recall correctly). Right now, very few of them are available in API format—that is, as live datasets that can be queried remotely by third party software applications—but I learned in D.C. last week that there is some serious work under way to change that. So most of the data sets on Data.gov right now are as bulk downloads—huge databases or spreadsheets or text files that have to be downloaded and examined before they can be used for anything by anybody.

    For example: is it reasonable to claim that API’s help explore data that the public has a right to know and there is no reason that access could compromise security of those databases. “It’s the people’s data and we have already paid for it, so show us what you got”

    Absolutely, yes.

    APIs can be wonderfully helpful for state agencies, because they obviate the need to ask them for data. The SBE used to have to deal with people coming into the offices and rifling through files in order to get campaign finance records. They needed special room and the ability to accommodate people. Now they just have a website. Thanks to their bulk data downloads (though no API, unfortunately), VPAP exists to make that data more useful and accessible, which further enhances the SBE’s offerings and further reduces public need for them to provide greater services (at the public’s cost).

    Better than dealing with FOIA requests is just putting everything out there in the first place.

    I’d not be surprised to hear state agencies make the case that if ANYTHING in the dataset cannot be released to the public that NONE of the dataset can be released because sooner or later some clever guy will figure out how to get it.

    A fine solution for this, used by many outfits that provide APIs, is to fork their databases. So imagine that the DMV has a database of hazmat vehicles, by type of material that they’re permitted to carry and records of what sort of materials they’ve carried, along what route, and when. (I don’t even know if that’s a thing, but let’s just pretend. :) Of course, they’ve got their own in-house database for that, and it probably contains all kinds of information that should be private. Solution: Keep that database on the state government’s network, and have a script that runs hourly or daily that exports only the public data to a public-facing database.

    That said, there are lots of companies (e.g., Google, Microsoft, Amazon, Flickr) who successfully manage to partition off private data from public data within a database that’s accessible publicly via an API. Technologically, this is nearly identical to having a part of a website with public data (Amazon’s product inventory) and a part of with private data (Amazon’s record of your credit card number, purchase history, etc.)

    Somehow Waldo was able to convince the General Assembly folks to allow him access to either real-time or near-real-time data and I suspect however they accomplished that would be a viable model for state agencies.

    To their great credit, the Division of Legislative Automated Systems already provided nearly everything that I needed. Unfortunately, they don’t have an API, but they have bulk downloads, which I used to create an API for the legislature. They update their data hourly. The exception to this was voting data, but I simply requested that they start providing bulk downloads of that information, paid them to cover their costs (as per FOIA), and within weeks, that data was available, too. It’s updated hourly.

    Those guys are seriously committed to open data. They lack a legislature that’s willing to turn them loose, and they lack the funding to get some of it done, but they really seem all about getting as much legislative data as public as possible.

Comments are closed.