Category Archives: Geek

Cloud corporations.

Many months ago, my friend Tim Hwang told me that he’d like to see an API created for corporate registrations, because that would enable all kinds of interesting things. Tim runs the semi-serious Robot Robot & Hwang, a legal startup that aspires to be a law firm run entirely in software. I’ve been chewing over this idea for the past year or so, and I’m convinced that, writ large, this could constitute a major rethinking of the Virginia State Corporation Commission. Or, really, any state’s business regulation agency, but my familiarity and interest lies with Virginia. But first I have to explain Amazon Web Services. (If you don’t need that explained, you can skip past that bit.)

Amazon Web Services

Not so long ago, if you wanted to have a web server, you needed to actually acquire a computer, or pay a website host to do so on your behalf. That might cost a couple of thousand dollars, and it took days or weeks. Then you had to set it up, which probably meant somebody installing Linux or Windows from CD-ROMs, configuring it to have the software that you needed, mounting it in a rack, and connecting it to the internet. You’d have to sign a contract with the host, agreeing to pay a certain amount of money over a year or more in exchange for them housing your server and providing it with a connection to the internet. That server required maintenance throughout its life, some of which could be done online, but occasionally somebody had to go in to reboot it or swap out a broken part. But what if your website suddenly got popular, if your planned 100 orders per day turned into 10,000 orders per day? Well, you had to place orders for new servers, install operating systems on them, mount them in more racks, and connect them to the internet. That might take a few weeks, in which time you could have missed out on hundreds of thousands of orders. And when your orders drop back to 100 per day, you’ve still got the infrastructure—and the bills—for a much more popular website.

And then, in 2006, Amazon.com launched Amazon Web Services, a revolutionary computing-on-demand service. AWS upended all of this business of requisitioning servers. AWS consists of vast warehouses of servers that, clustered together, host virtual servers—simulated computers that exist in software. To set up a web server via AWS, you need only to complete a form, select how powerful of a server that you want, agree to pay a particular hourly rate for that server (ranging from a few cents to a few dollars per hour), and it’s ready within a few minutes. Did your planned 100 orders turn into 10,000? No problem—just step up to a more powerful server, or add a few more small servers. Did your 10,000 orders go back to 100? Scale your servers back down again. Better still, AWS has a powerful API (application programming interface), so you don’t even have to even intervene—you can set your own servers to create and destroy themselves, control them all from an iPhone app, or let software on your desktop start up and shut down servers without any involvement on your part.

There are other companies providing similar cloud computing services—Rackspace, Google, and Microsoft, among others—but Amazon dominates the industry, in part because they were first, and in part because they have the most robust, mature platform. There remain many traditional website hosts, which you can pay to house your physical servers, but they’re surely just a few years away from being niche players. Amazon did it first, Amazon did it best, and Amazon is the hosting company to beat now.

Cloud Corporations

Imagine Virginia’s State Corporation Commission (SCC) using the Amazon Web Services model. Virginia Business Services, if you will. One could create a business trivially, use it for whatever its purpose is, and then shut it down again. That might span an hour, a day, or a week. Or one could start a dozen or a hundred businesses, for different amounts of time, with some businesses owned by other businesses.

Why would you do this? This is actually done already, albeit awkwardly. Famously, the Koch brothers maintain a complicated, sophisticated web of LLCs, which they create, destroy, and rename to make it difficult to track their political contributions. This probably costs them millions of dollars in attorneys’ fees alone. Doing so is perfectly legal. Why should that only be available to billionaires? Or perhaps you want to give a political contribution to a candidate, but not in your own name. Wealthy people create a quick LLC to do this. Maybe you want to host a one-off event, or print and sell a few hundred T-shirts as a one-time thing—a corporate shield would be helpful, but hardly worth the time and effort, except for the wealthy. There’s no reason why the rest of us shouldn’t be able to enjoy these same protections and abilities.

Cloud corporations would be particularly useful to law firms who specialize in managing legal entities. Right now, they spend a lot of time filing paperwork. Imagine if they could just have a desktop program, allowing them to establish a corporation in a few minutes. Instead of charging clients $1,500, they could charge $500, and make an even larger profit. Although surely Delaware would remain attractive for registering many corporations, due to their friendly tax laws, the ease of registering a corporation in Virginia would surely make it attractive for certain types of business.

So what would the SCC need to do to make this happen? Well, right now, one can register for an account on their site, complete a form on their website, pay $75 via credit card, and have a corporation formed instantly. From there on out, it costs $100/year, plus they require that an annual report be filed. Both of these things can be done via forms on their website. (Note that these dollar values are for stock corporations. There are different rates for non-stock corporations and limited liability corporations.) All of which is to say that they’ve got the infrastructure in place for purely digital transactions.

But to support to an AWS model, they’d need to make a few changes. First they’d have to expose the API behind those forms, to allow programmatic access to the SCC’s services. Then they’d have to add a few new services, such as the ability to destroy a business. And they’d need to change their pricing, so that instead of being billed annually, pricing would be based on units of weeks, days, or even hours. (That pricing could be elevated significantly over standard pricing, as a trade-off for convenience.) The SCC has some antiquated regulations that would need to be fixed, such as their requirement that a business have a physical address where its official documents are stored (“Google Docs” is not an acceptable location). Finally, to do this right, I suspect that the Virginia Department of Taxation would need to get involved, to allow automated payment of business taxes (something that Intuit has spent a great deal of money to prevent) via an API.

Next Steps

I regret that this is unlikely to happen in Virginia. The State Corporation Commission is like its own mini-government within Virginia, with its own executive, legislative, and judicial functions, and seems accountable to nobody but themselves. FOIA doesn’t even apply to them. They’re not known as a forward-thinking or responsive organization, and I’m dubious that either the legislature or the governor could persuade them or even make them do this.

But I am confident that some state will do this (I hope it won’t be Delaware) and that, eventually, all states will do this. It’s inevitable. Whoever does it first, though, will enjoy a first-mover advantage, perhaps on the scale of Amazon Web Services. I’ll enjoy watching it. Maybe I’ll even register a few corporations myself.

A Virginia campaign finance API.

Last year, I wrote here that I was working on an open-source campaign finance parser for Virginia State Board of Elections data. Thanks to the good work of the folks at the SBE, who are making enormous advances in opening up their data, I’ve been able to make some great progress on this recently. That open-source project, named “Saberva,” is now a fully-functioning program. When run, it gathers a host of data from the State Board of Elections’ great new campaign finance site and saves it all as a series of machine-readable JSON files. (And a simple CSV file of basic committee data, which is more useful for some folks.) The program is running on Open Virginia, which means that, at long last, Virginia has an API and bulk downloads for campaign finance data.

This is now the source of Richmond Sunlight‘s campaign finance data about each candidate (currently limited to their cash-on-hand and a link to their most recent filing), which provides me with a good incentive to continue to improve it.

If you’ve got ideas for how to improve this still-young project, you’re welcome to comment here, open a ticket on GitHub, or make a pull request. Hate it, and want to copy it and make your own, radically different version? Fork it! It’s released under the MIT License, so you can do anything you want with it. I look forward to seeing where this goes.

New site, new datasets.

Since creating Richmond Sunlight and Virginia Decoded, I’ve been building up a public trove of datasets about Virginia government: legislative video, the court system’s definitions of legal terms, court rulings, all registered dangerous dogs, etc. But they’re all scattered about on different websites. A couple of years ago, I slapped together a quick site to list all of them, but I outgrew it pretty quickly.

So now I’m launching a new site: the Open Virginia data repository. It’s an implementation of the excellent CKAN data repository software (which will soon drive Data.gov). The idea is to provide a single, searchable, extensible website where every known state dataset can be listed, making them easy to find and interact with. It’s built on the industry’s best software, in part because I’m hopeful that, eventually, I can persuade Virginia to simply take the site from me, to establish a long-overdue data.virginia.gov.

There are a few new datasets that accompany this launch:

  • The Dangerous Dog Registry as JSON, meaning that programmers can take these records and do something interesting with them. (Imagine an iPhone app that tells you when you’re close to a registered dangerous dog.) Previously I provided this only as HTML.
  • VDOT 511 Geodata. This is the GeoJSON that powers Virginia 511, exposed here for the first time. Road work, traffic cameras, accidents—all kinds of great data, updated constantly, with each GeoJSON feed listed here.
  • Public comments on proposed regulations. Over 28,000 comments have been posted by members of the public about regulations to the Virginia Regulatory Town Hall site over the past decade. Now they’re all available in a single file (formatted as JSON), for programmers to do interesting things with.

There’s so much more to come—good datasets already available, and datasets that need to be scraped from government sites and normalized—but this is a good start. I’m optimistic that providing an open, accessible home for this data will encourage others to join in and help create a comprehensive collection of data about the Virginia government and its services.

$500 speech transcription bounty claimed.

It took just 27 hours for the $500 speech transcription bounty to be claimed. Aaron Williamson produced youtube-transcription, a Python-based pair of scripts that upload video to YouTube and download the resulting machine-generated transcripts of speech. It took me longer to find the time to test it out than it did for Aaron to write it. But I finally did test it, and it works quite well.

There are lots of changes and features that I’d like to see, and the beauty of open source software is that those changes don’t need to be Aaron’s problem—I (and anybody else) can make whatever changes that I see fit.

This will be pressed into service on Richmond Sunlight ASAP. Thanks to Matt Cutts for the idea, and to the 95 people who backed this project on Kickstarter, since they’re the ones who funded this effort.

$500 bounty for a speech transcription program.

The world needs an API to automatically generate transcript captions for videos. I am offering a $500 bounty for a program that does this via YouTube’s built-in machine transcription functionality. It should work in approximately this manner:

  1. Accepts a manifest that lists one or more video URLs and other metadata fields. The manifest may be in any common, reasonable format (e.g., JSON, CSV, XML).
  2. Retrieves the video from the URL and stores it on the filesystem.
  3. Uploads the video to YouTube, appending the other metadata fields to the request.
  4. Deletes the video from the filesystem.
  5. Downloads the resulting caption file, storing it with a unique name that can be connected back to a unique field contained within the manifest (e.g., a unique ID metadata field).

Rules

  • Must be written in a common, non-compiled language (e.g., Python, PHP, Perl, Ruby) that requires no special setup or server configuration that will run on any standard, out-of-the-box Linux distribution.
  • Must run at the command line. (It’s fine to provide additional interfaces.)
  • May have additional features and options.
  • May use existing open source components (of course). This is not a clean-room implementation.
  • May be divided into multiple programs (e.g., one to parse the manifest and retrieve the specified videos, one to submit the video to YouTube, and one to poll YouTube for the completed transcripts), or combined as one.
  • Must be licensed under the GPL, MIT, or Apache licenses. Other licenses may be considered.
  • If multiple parties develop the program collaboratively, it’s up to them to determine how to divide the bounty. If they cannot come to agreement within seven days, the bounty will be donated to the 501(c)3 of my choosing.
  • The first person to provide functioning code that meets the specifications will receive the bounty.
  • Anybody who delivers incomplete code, or who delivers complete code after somebody else has already done so, will receive a firm handshake and the thanks of a grateful nation.
  • If nobody delivers a completed product within 30 days then I may, within my discretion, award some or all of the bounty to whomever has gotten closest to completion.

Participants are encouraged to develop in the open, on GitHub, and to comment here with a link to their repository, so that others may observe their work, and perhaps join in.

This bounty is funded entirely by the 95 folks who backed this Kickstarter project, though I suppose especially by those people who kept backing the project even after the goal was met. I deserve zero credit for it.

Request for Awesome.

I was lucky enough to spend last week at the Aspen Institute, attending the annual Forum on Communications and Society. Thirty-odd of us spent four days talking about how to make government more open and more innovative. The guest list will leave reasonable people wondering how I got invited—Madeline Albright, Toomas Hendrik Ilves (the President of Estonia), Esther Dyson, Reed Hundt (FCC Chairman under President Clinton), and Eric Schmidt (chairman of Google) were just some of the most famous attendees.

Aspen View

We broke into groups, and were assigned general topics on which to devise a proposal for how to make governance more open and innovative. I wound up in a group with Esther Dyson, Tim Hwang, Max Ogden, Christine Outram, David Robinson, and Christina Xu. We came up with some pretty great proposals, at least one of which I intend to pursue personally, but ultimately we settled on the need to overhaul the government RFP process, and to create a policy vehicle to bid out lightweight, low-dollar technical projects, and to attract bids from startups and other small, nimble tech organizations. The idea isn’t to replace the existing RFP process, but to create a parallel one that will enable government to be more nimble.

We call our proposal Request for Awesome, and it has been received enthusiastically. Two days after we announced our proposal, a half dozen cities had committed to implementing it, and no doubt more have rolled in in the week since. Max and Tim are particularly involved in pushing this forward, and I don’t doubt that they’ll spread this farther.

I was very impressed by the Aspen Institute and by the Forum on Communications and Society. I’ve probably been to a dozen conferences so far this year, and this one was head and shoulders above the rest, perhaps the best I’ve ever been to. The Aspen Institute enjoys a strong reputation, and now I see why. Here’s hoping I get invited back some day.

New Virginia Decoded features.

Since March, my 9–5 job has been building The State Decoded, software based on my Virginia Decoded site. Although it would be fun to have spent all of this time adding new features to Virginia Decoded, most of it has been spent adapting the software to support a wide variety of legal structures. I released version 0.2 of the software earlier this week (3 weeks late!), and I’m on target to release version 0.3 next week. Which is to say that I’m finally getting to the point where I have a solid software base, and I’ve been able to start adding features to the core software that are making their way into Virginia Decoded.

Here are some of the new features that are worth sharing:

  • Newly backed by the Solr search engine (courtesy of the good folks at Open Source Connections, who did all of the work for free!), not only does the site have really great search now, but I’m able to start using that search index to do interesting things. The best example of that is the “Related Laws” box in the sidebar. For instance, § 2.2-3704.1—part of the state’s FOIA law—recommends § 30-179 as related. As well it should—that’s the law that spells out the powers of the Virginia Freedom of Information Advisory Council. But it’s found clear on the other side of the Code of Virginia—somebody would be unlikely to stumble across both of them normally, but it’s easy on Virginia Decoded. This is just the first step towards breaking down the traditional title/chapter/part divisions of the Code of Virginia.
  • Several hard-core Code readers have told me that they wish it were faster to flip around between sections. I agree—it should be super easy to go to the next and prior sections. Solution: I’ve bound those links to the left and right arrow keys on the keyboard. Just open a section and try out your arrow keys.
  • The indecipherable history sections at the bottom of each law are being translated into plain English. For instance, compare the text at the end of § 2.2-3705.2 on Virginia’s website and on Virginia Decoded. It’s an enormous improvement. This certainly isn’t perfect, but it will be with a few more hours of work.
  • Amendment attempts have detailed information. Whenever a law has had bills introduced into the General Assembly to amend them, whether or not those bills passed, they’re listed in the sidebar. That’s not new, what’s new is a bit of Ajax that pulls over details about those bills from Richmond Sunlight when you pass your mouse over each bill number, showing you the bill’s sponsor, his party, where he represents, and the full summary of the bill. (For example, see § 9.1-502.) This is one step closer to providing an unbroken chain of data throughout the process of a bill becoming law (becoming a court ruling).

There’s a lot more coming, now that I’ve just about got a solid platform to add features to, but these few were just too good not to mention.

Sunlight Foundation “OpenGov Champion.”

The Sunlight Foundation has put together a very kind mini-documentary about my open government technology work. (I can’t see that any of its contents will come as news to anybody who reads this blog.) It was fun to participate in the making of it, and it was a joy to watch filmmakers Tiina Knuutila and the aptly named Solay Howell at work throughout the process. I’m a big fan of the Sunlight Foundation (they funded the addition of video to Richmond Sunlight in the first place), and it’s flattering that they’d even be institutionally aware of me.

Congress declines to let people download copies of bills.

From the U.S. House Committee on Appropriations comes their annual report on spending on the legislature, this one for the 2012–2013 fiscal year. It includes this gem of a section (on pages 17–18) on proposed spending to let people download copies of bills:

During the hearings this year, the Committee heard testimony on the dissemination of congressional information products in Extensible Markup Language (XML) format. XML permits data to be reused and repurposed not only for print output but for conversion into ebooks, mobile web applications, and other forms of content delivery including data mashups and other analytical tools. The Committee has heard requests for the increased dissemination of congressional information via bulk data download from non-governmental groups supporting openness and transparency in the legislative process. While sharing these goals, the Committee is also concerned that Congress maintains the ability to ensure that its legislative data files remain intact and a trusted source once they are removed from the Government’s domain to private sites.

The GPO currently ensures the authenticity of the congressional information it disseminates to the public through its Federal Digital System and the Library Congress’s THOMAS system by the use of digital signature technology applied to the Portable Document Format (PDF) version of the document, which matches the printed document. The use of this technology attests that the digital version of the document has not been altered since it was authenticated and disseminated by GPO. At this time, only PDF files can be digitally signed in native format for authentication purposes. There currently is no comparable technology for the application and verification of digital signatures on XML documents. While the GPO currently provides bulk data access to information products of the Office of the Federal Register, the limitations on the authenticity and integrity of those data files are clearly spelled out in the user guide that accompanies those files on GPO’s Federal Digital System.

The GPO and Congress are moving toward the use of XML as the data standard for legislative information. The House and Senate are creating bills in XML format and are moving toward creating other congressional documents in XML for input to the GPO. At this point, however, the challenge of authenticating downloads of bulk data legislative data files in XML remains unresolved, and there continues to be a range of associated questions and issues: Which Legislative Branch agency would be the provider of bulk data downloads of legislative information in XML, and how would this service be authorized. How would “House” information be differentiated from “Senate” information for the purposes of bulk data downloads in XML? What would be the impact of bulk downloads of legislative data in XML on the timeliness and authoritativeness of congressional information? What would be the estimated timeline for the development of a system of authentication for bulk data downloads of legislative information in XML? What are the projected budgetary impacts of system development and implementation, including potential costs for support that may be required by third party users of legislative bulk data sets in XML, as well as any indirect costs, such as potential requirements for Congress to confirm or invalidate third party analyses of legislative data based on bulk downloads in XML? Are there other data models or alternative that can enhance congressional openness and transparency without relying on bulk data downloads in XML?

Accordingly, and before any bulk data downloads of legislative information are authorized, the Committee directs the establishment of a task force composed of staff representatives of the Library of Congress, the Congressional Research Service, the Clerk of the House, the Government Printing Office, and such other congressional offices as may be necessary, to examine these and any additional issues it considers relevant and to report back to the Committee on Appropriations of the House and Senate.

This is bullshit.* Either that or congress is relying on advisors who are simultaneously very smart and very stupid. What congress fears here is actually none of these things, but instead they are afraid of the fact that it is 2012. By not providing bulk downloads of legislation, they’re requiring that Josh Tabuerer keep scraping its text off their website to post at GovTrack.us, from which all of other other open congress websites get their text. If Josh wants to verify that the version of a bill that he has is accurate, he’s out of luck. There’s no master copy. For all technical purposes, congress is silent on what their bills say. (I have this same problem with Virginia legislation on Richmond Sunlight.) For Appropriations to argue that releasing legislation as XML presents potential problems with the accuracy of the circulated text is to pretend that a) there’s already a healthy ecosystem of unauthorized bulk congressional legislative data and b) that their failure to participate in that ecosystem is the source of any accuracy problems, and that by providing data themselves, then it becomes technologically trivial to verify the accuracy of a copy of a bill.

The Sunlight Foundation just posted a detailed rebuttal to the claims in this study, which go into more detail than I’m prepared to.

This is a real embarrassment, both to congress and to the United States. I’ve got a bit of experience in the federal data realm, and I can tell you that in the realm of open data, compared to the White House, Congress is trapped in the stone age. Now we see that they intend to stay there.

* Note that I am using a very specific definition of “bullshit”; in short, a false statement that both the speaker and the listener know to be untrue.

Opening up Virginia campaign finance data with Saberva.

With the Virginia State Board of Elections starting to provide bulk campaign finance data, a whole new world of data has opened up, and I intend to make the most of it.

Although the esteemed Virginia Public Access Project has long provided this information (laboriously cleaned up and displayed in a user-friendly fashion), it’s useful only to end users. There’s no API, no bulk downloads, etc., so it’s not possible for that data to be incorporated into Richmond Sunlight, Virginia Decoded, iOS apps, etc. That’s not a knock on VPAP—their line of business is providing this information to end users, period.

My normal instinct is to create a website that gathers and displays this data and, by the way, provides bulk downloads and an API. (For example, see Richmond Sunlight’s API guide and downloads directory, or Virginia Decoded’s downloads directory (the API is in alpha testing now).) But the website is, in this instance, unnecessary. VPAP is doing a better job of that than I can.

Instead, I intend to provide the tools for others to use this data. To that end, I’m developing Saberva, currently hosted on GitHub, a parser that gathers the data from the SBE’s servers, cleans it up, and exports it all to a MySQL database. (“Saber” as in Spanish for “to know,” and “VA” as in Virginia.) At first it’ll just be a program that anybody can run to get a big, beautiful pile of data, but I intend to provide bulk downloads (as MySQL and CSV) and an API (probably just as JSON). Slowing things down somewhat is the fact that I’m writing this in Python, a programming language that I know well enough to muck around in other people’s code, but not nearly well enough to write something of my own from scratch. This seems like the chance to learn it, and I think that Python is the right language for this project.

Awkwardly (for me), I’m learning this new language out in the open, on GitHub. GitHub, for those non-programmers, is a source code sharing website, for folks who, like me, develop software collaboratively. Every change that I make—every new line of code, every mistake—is chronicled on the project’s GitHub page. The tradeoff is that others can contribute to my code, making improvements or correcting my errors. Open government hacker Derek Willis has already forked Saberva, replacing and improving my laborious CSV parsing processes with Christopher Groskopf’s excellent csvkit.

Right now, Saberva will download the data for a single month (April), clean it up a bit, save a new CSV file, and create a file to allow it to be imported into a MySQL database. I’ve got the framework for something useful, and now it remains to be made genuinely useful.

If you’re handy with Python, and you know your way around Git, I hope you’ll consider lending a hand, even just cleaning up a few lines of code or adding a bit more functionality. Lord knows I could use the help.

Introducing Virginia Decoded.

Since it’s Sunshine Week, I figured I should stop obsessively polishing Virginia Decoded and just make it public. So here it is. What is it? Think Richmond Sunlight, but for the whole Code of Virginia, rather than just the bills proposed each year.

So why not use the official website for the code? Look at the state’s code website. Now back to me. Now back to the state’s code website. Now back to me. I’m on a horse.

You can find out more on the State Decoded website and the Virginia Decoded “About” page.

I’m speaking at tomorrow’s Jefferson Jackson event in Richmond.

I don’t normally mention my public speaking engagements, but tomorrow I’ve got one that’s free, open to the public, and liable to be of general interest. Tomorrow is the Democratic Party of Virginia’s Jefferson Jackson Dinner, the party’s big annual event. As a part of the day’s activities, I’m speaking on a panel with former (as of two days ago) U.S. CTO Aneesh Chopra, Peter Levin, and Macon Phillips. The topic is open technological innovation and its effects on politics and policy, which is my favorite topic to talk about. It’s at 1:30 PM at the Richmond Marriott.

(Incidentally, I think the women’s caucus should have their own event, focusing on reproductive rights, and call it the “VA JJ Dinner.” Hilarity ensues.)

I seem to have this website.

I publicly launched Richmond Sunlight five years ago this week. Upon its launch I gave it to the Virginia Interfaith Center for Public Policy because, as I wrote, “they’re non-partisan, they have an attention span longer than a housefly, and they have access to resources that I don’t.” I concluded: “I’ll run it for them for the next six months, while we train an editor and a webmaster to take it over. Then I can move on to my next project.” Richmond Sunlight is something that I want to exist, but not something that I actually want to be my problem. But nothing ever changed: the website was Virginia Interfaith’s in a legal sense (on a handshake deal), but in all other practical senses, it was mine. Every bit of the website was mine to run, from stem to stern…which was the opposite of my goal. It occupied enough of my time I couldn’t move onto that next project. In March, I informed the Virginia Interfaith Center that I had just worked my last session—they’d need to finally hire that webmaster. And I walked away from Richmond Sunlight, which is what enabled me to get started on Virginia Decoded. (Which is on hold briefly while I’m working for the White House.)

A few weeks ago, the Virginia Interfaith Center decided that they couldn’t operate Richmond Sunlight. The cost of paying somebody with the appropriate skill set would be too high and, besides, they’re between executive directors, and have more important things going on. So I asked for them to give it back, which they did cheerfully.

So I seem to have this website. Now I’m trying to figure out what to do with it. Giving it away hasn’t worked out, so now I need to chart a course that will allow it to grow and thrive, and also be financially sustainable.

Perhaps I could start a 501(c)3 to house Richmond Sunlight, Virginia Decoded, Open Virginia, and my other nascent efforts towards open government in Virginia? But then what—where does the money come from? I worry that advertising could make Richmond Sunlight appear disreputable. I think I could get some grants, but that’s ultimately not a business model. Maybe a few site sponsors (advertising lite), though I don’t know that anybody would be willing to pay enough to hire somebody to run the site during session. I do have a mostly completed “pro” version of Richmond Sunlight, but I’ve hesitated to launch it because I can’t provide the support that customers would deserve. (“Sorry it’s broken for you, but I’m at work now. And I’m busy tonight. How’s Saturday for you?”) While there’s a bit of a horse-and-cart problem there, the revenue from that could well make it possible to hire somebody to provide that support and also run the website. Perhaps there’s a partnership waiting to happen—some organization with whom the site could have a mutually beneficial relationship?

I’m soliciting ideas. What should I do with Richmond Sunlight? How do I ensure that it continues to exist, fulfills its potential, but doesn’t keep me from moving onto other projects?

“Your ideas are intriguing to me, and I wish to subscribe to your newsletter.”

There was a moment in an episode of The Simpsons (“Mountain of Madness”), that aired back in 1997, that I’ve mentally revisited every so often over the past decade:

In transcript form:

Homer: So, Mr. Burns is gonna make us all go on a stupid corporate retreat up in the mountains to learn about teamwork. Which means we’ll have to cancel our plans to hang around here.
Bart: Teamwork is overrated.
Homer: Huh?
Bart: Think about it. I mean, what team was Babe Ruth on? Who knows.
Lisa+Marge: Yankees.
Bart: Sharing is a bunch of bull, too. And helping others. And what’s all this crap I’ve been hearing about tolerance?
Homer: Hmm. Your ideas are intriguing to me, and I wish to subscribe to your newsletter.

In 1997, that last line was the joke: I wish to subscribe to your newsletter. Homer is such a dope that he thinks that he can subscribe to a newsletter from just one person, as if that would ever be practical for your average person, especially his own nine-year-old son.

Of course, that’s a thing now: Twitter, Facebook, and Google Plus. On Twitter, about 1,200 people “subscribe to my newsletter,” in its various guises (I have six Twitter feeds). At least a couple of times each week, I think “your ideas are intriguing to me, and I wish to subscribe to your newsletter,” and I hit “Follow” or “Subscribe” or “Add to Circles.” It was a punchline 14 years ago. Now it’s just part of our social fabric.

More on my White House adventure.

I mentioned in June that I’d gotten an award from the White House. Now they’re promoting it on their website. On the “Champions of Change” site—that’s the award that I got—they’re featuring the sixteen of us who received awards on that occasion, all in the realm of open data technology. I was in awfully good company, rather outclassed by most of the other folks, all of whom you can read about on that site. The link to information about me currently goes to Omar Epps’ entry (a mistake that, I assure you, is not often made), but my entry is linked to from elsewhere, luckily. There’s even a brief video interview with me, which I find excruciating to watch, but one can do so. (Something about being interviewed on video makes me think that I should probably blink a lot. Somewhere in my reptile brain, something is saying “you should blink a lot—it’ll make you look smart!” Instead, it makes me look simultaneously flirtatious and convulsive.)

Perhaps my favorite bit is being mentioned in a blog entry today by US CTO Aneesh Chopra, along with two other folks from my award group:

Waldo Jaquith used his free time to facilitate a more open government. Despite long hours at his day job, Waldo found the time to launch Richmond Sunlight, a volunteer-run site that keeps track of the Virginia legislature, including manually uploading hundreds of hours of CSPAN-inspired video of floor speeches, tagging relevant information on bills and committee votes, and inviting the public to comment on any particular legislation. He solicits feedback, introduces new products and services, and encourages others to participate. In short, he embodies the spirit that drives the Internet economy – “rough consensus, running code.”

[...]

Leigh, Waldo and David are part of a growing network of open innovators tapping into (or contributing to) government data that is both “human-friendly” (you can find what you need), and “computer-friendly” so that entrepreneurs can develop applications that both solve big problems and create jobs in an increasingly competitive economy. I’m confident this growing band of app-developing brothers and sisters will help us invent our way to a clean energy economy, achieve a “quantum leap” in learning outcomes, and strengthen America’s manufacturing sector. To support them, I’ve directed technology and innovation leaders across the federal government to learn from these best practices and scale what works.

I can feel all of this receding into memory, given the title of The Story about the Time I Went to the White House and Won an Award to be hauled out and recited at dinner parties, making my wife roll her eyes as the years go by. It was fun.

The merits of government apps contests.

On O’Reilly Radar, Andy Oram makes this important point about apps contests:

It’s now widely recognized that most of the apps produced by government challenges are quickly abandoned. None of the apps that won awards at the original government challenge–Vivek Kundra’s celebrated Apps for Democracy contest in Washington, DC–still exist.

He went on to explain there’s actually a single exception to that, but I think that the point stands. It is trendy in government circles to hold “apps contests” (a phrase that I don’t think I’ve heard elsewhere, at least outside of the iOS ecosystem), where private developers create software based on public data sets. Though the concept can seem good at first blush, I’ve been dubious.

I was lucky to be invited to the Health Data Initiative Forum last month, an event put on by the Institute of Medice at the National Institutes of Health. The Health Data Initiative describes itself as “a public-private collaboration that encourages innovators to utilize health data to develop applications to raise awareness of health and health system performance and spark community action to improve health.” (Speaking of which, I got a Fitbit a couple of days ago—it’s a great little device.) The aforementioned Adam Oram was at the same event, and his summation of the event is better than anything I’m liable to come up with. While watching the presentations of some of the “apps” in question, I realized that there were two categories of them: those that were liable to exist six months later, and those that weren’t. These apps have got to have a business model, whether making money themselves, or being such clear grant-bait that it’s clear an organization will take them in-house. Otherwise it’s just a toy that will do nothing to benefit anybody. The exception is perhaps for government units that are not collectively persuaded that there’s value to opening up their data—perhaps such contests to put their data to work can serve as inspiration.

There isn’t an inherent problem in apps contests, I don’t think, but they’re probably not worth bothering with unless there’s a simultaneous effort to foster a community around those data. There’s got to at least be a couple of ringers, folks with good ideas who are prepared to create something valuable. Otherwise I think apps contests are liable to disappear as quickly as they appeared, a strange blip in the upward climb of open data technologies.

My new adventure: The State Decoded.

A little project that I started a year ago now has been eating up a lot of my time, especially in the past eight months or so. I decided that, as Richmond Sunlight improves the display of legislation, I should create a new site to improve the display of the state code. It could hardly look any worse. (See § 18.2-32. First and second degree murder defined; punishment.) By late last fall, I’d spent a lot of time looking at other states’ codes, and realized that they’re all bad—states had put their codes online around 1995, and hadn’t really bothered with them since.

So I applied to the News Challenge, proposing to expand this Virginia-focused hobby into a full-blown project to improve all state codes. The News Challenge is an annual competition held by the John S. and James L. Knight Foundation, for which they accept thousands of grant applications, and a few are selected as winners on their basis to “advance the future of news [with] new ways to digitally inform communities,” in their words. There were north of 1,600 applications, and mine was one of 16 to win, I learned a few weeks ago.

A bit more detail about my project—“The State Decoded”—is available on their website, and lengthy coverage is available from the Nieman Journalism Lab. In a nutshell, they’re giving me $165,000 to spend a year and a half modularizing this code and making it available to groups in every state in the union to put their codes online. A bunch of that time I’ll spend coding, and a bunch more I’ll spend identifying stakeholders in states across the country, convincing them to deploy this (free) software to make their state codes more accessible.

I spent the whole last week in Cambridge, Massachusetts at the MIT–Knight Civic Media Conference, an invitation-only, all-expenses paid conference at which the News Challenge winners were announced before the audience of a few hundred people. We all made brief speeches about our projects and received trophies, and then got to spend a few days in the company of some awfully interesting people.

Obviously, this is an exciting change for me. My employer, The Miller Center, has kindly agreed to act as the fiscal agent for the grant, so I will continue to work there, although nearly all of my time will be spent on this new project.

I’m working on getting back to Virginia Decoded, the project that spawned all of this, which I’ve (happily) had to delay a bit to get started on The State Decoded. John Athayde has been doing some great design work on it recently, and a couple of dozen people have been alpha testing it for the last month, which has been really helpful. Once I check off a few more features from the feature list, I’ll open it up to all of the beta testers. I’ve personally found the site really useful as a legal reference tool, so I’m eager to see it accessible to a wider audience. I hope you’ll like it, too.

Creating an API for the commonwealth.

In perhaps a misguided effort, I started work a few months ago to create a site to round up all of the APIs for government data in Virginia. That inglorious website is Open Virginia. It’s easy to catalog all of them, because right now, I know of exactly two: Richmond Sunlight’s legislative API and my new API for state court decisions. If the state maintains a single API, I don’t know about it.

(Don’t understand this “API” business? Application programming interfaces (APIs) are how software talks to software, and they’re the underpinning of the modern web. For instance, post a comment here, WordPress submits your comment to Akismet, Akismet evaluates the comment to determine if it looks like spam, and returns a score that WordPress uses to decide whether your comment is published or held for me to review. And Richmond Sunlight’s API allows other websites to automatically retrieve information about legislators, the status of legislation, Photosynthesis portfolios, etc.)

So, about that API for state court decisions. Right now it does just one thing: when given a section of the state code, it returns a listing of all published Court of Appeals decisions since May 2, 1995. (Note that it’s liable to be particularly useful when combined with with the Richmond Sunlight API call to retrieve a list of bills that affect a given section of the state code.) Soon enough that’ll include Supreme Court of Virginia rulings. There are all sorts of obvious additional functions that I intend to add soon enough, such as returning information on given decisions, returning a list of cases that match a given keyword, returning a list of cases for a given period of time, etc.

What good does this do you right now? In all likelihood, none at all, unless you know what to do with and have some use for, say, JSON data for cases concerning § 20-107.3. It’s not until this API is pressed into service on other websites that it’ll be of value to virtually anybody. Maybe there’s no audience for this—maybe there are no programmers just waiting for some shiny new APIs, chock full of state government data. But we won’t know unless we create some APIs, will we?

Follow Open Virginia on Twitter to keep up with the opening up of Virginia through the magic of APIs.

McDonnell on open government.

Gov. McDonnell, on open government:

“I’ve long been an advocate of putting our full budget, all our legislation, a number of things about state government online in an easy to download, easy to access fashion,” the governor said.

Really? I mean, if that’s true, that’s great, but it’s news to me. If there has been any advance in the provision of information online from the executive branch during McDonnell’s tenure, I have not noticed it. I don’t recall him introducing any bill on the topic during his years in the legislature, and a quick scan through his legislation doesn’t anything. Maybe by “advocate” he just means “supporter in spirit.”

Virginia puts a lot of this sort of information online, but it’s a mess. It’s found on dozens of websites, in a half dozen different formats, all different, none compatible, often near invisible to search engines, without an API in sight. The UI is always a train wreck. It’s bad enough that the sites look lousy, but they almost never provide any method for third parties to pull the data out and display it in a more meaningful fashion. I’ve been spending my nights and weekends on this for the past six months and, I’ll tell you, this kind of thing is frustrating as hell.