Category Archives: Tech

Request for Awesome.

I was lucky enough to spend last week at the Aspen Institute, attending the annual Forum on Communications and Society. Thirty-odd of us spent four days talking about how to make government more open and more innovative. The guest list will leave reasonable people wondering how I got invited—Madeline Albright, Toomas Hendrik Ilves (the President of Estonia), Esther Dyson, Reed Hundt (FCC Chairman under President Clinton), and Eric Schmidt (chairman of Google) were just some of the most famous attendees.

Aspen View

We broke into groups, and were assigned general topics on which to devise a proposal for how to make governance more open and innovative. I wound up in a group with Esther Dyson, Tim Hwang, Max Ogden, Christine Outram, David Robinson, and Christina Xu. We came up with some pretty great proposals, at least one of which I intend to pursue personally, but ultimately we settled on the need to overhaul the government RFP process, and to create a policy vehicle to bid out lightweight, low-dollar technical projects, and to attract bids from startups and other small, nimble tech organizations. The idea isn’t to replace the existing RFP process, but to create a parallel one that will enable government to be more nimble.

We call our proposal Request for Awesome, and it has been received enthusiastically. Two days after we announced our proposal, a half dozen cities had committed to implementing it, and no doubt more have rolled in in the week since. Max and Tim are particularly involved in pushing this forward, and I don’t doubt that they’ll spread this farther.

I was very impressed by the Aspen Institute and by the Forum on Communications and Society. I’ve probably been to a dozen conferences so far this year, and this one was head and shoulders above the rest, perhaps the best I’ve ever been to. The Aspen Institute enjoys a strong reputation, and now I see why. Here’s hoping I get invited back some day.

New Virginia Decoded features.

Since March, my 9–5 job has been building The State Decoded, software based on my Virginia Decoded site. Although it would be fun to have spent all of this time adding new features to Virginia Decoded, most of it has been spent adapting the software to support a wide variety of legal structures. I released version 0.2 of the software earlier this week (3 weeks late!), and I’m on target to release version 0.3 next week. Which is to say that I’m finally getting to the point where I have a solid software base, and I’ve been able to start adding features to the core software that are making their way into Virginia Decoded.

Here are some of the new features that are worth sharing:

  • Newly backed by the Solr search engine (courtesy of the good folks at Open Source Connections, who did all of the work for free!), not only does the site have really great search now, but I’m able to start using that search index to do interesting things. The best example of that is the “Related Laws” box in the sidebar. For instance, § 2.2-3704.1—part of the state’s FOIA law—recommends § 30-179 as related. As well it should—that’s the law that spells out the powers of the Virginia Freedom of Information Advisory Council. But it’s found clear on the other side of the Code of Virginia—somebody would be unlikely to stumble across both of them normally, but it’s easy on Virginia Decoded. This is just the first step towards breaking down the traditional title/chapter/part divisions of the Code of Virginia.
  • Several hard-core Code readers have told me that they wish it were faster to flip around between sections. I agree—it should be super easy to go to the next and prior sections. Solution: I’ve bound those links to the left and right arrow keys on the keyboard. Just open a section and try out your arrow keys.
  • The indecipherable history sections at the bottom of each law are being translated into plain English. For instance, compare the text at the end of § 2.2-3705.2 on Virginia’s website and on Virginia Decoded. It’s an enormous improvement. This certainly isn’t perfect, but it will be with a few more hours of work.
  • Amendment attempts have detailed information. Whenever a law has had bills introduced into the General Assembly to amend them, whether or not those bills passed, they’re listed in the sidebar. That’s not new, what’s new is a bit of Ajax that pulls over details about those bills from Richmond Sunlight when you pass your mouse over each bill number, showing you the bill’s sponsor, his party, where he represents, and the full summary of the bill. (For example, see § 9.1-502.) This is one step closer to providing an unbroken chain of data throughout the process of a bill becoming law (becoming a court ruling).

There’s a lot more coming, now that I’ve just about got a solid platform to add features to, but these few were just too good not to mention.

Sunlight Foundation “OpenGov Champion.”

The Sunlight Foundation has put together a very kind mini-documentary about my open government technology work. (I can’t see that any of its contents will come as news to anybody who reads this blog.) It was fun to participate in the making of it, and it was a joy to watch filmmakers Tiina Knuutila and the aptly named Solay Howell at work throughout the process. I’m a big fan of the Sunlight Foundation (they funded the addition of video to Richmond Sunlight in the first place), and it’s flattering that they’d even be institutionally aware of me.

Congress declines to let people download copies of bills.

From the U.S. House Committee on Appropriations comes their annual report on spending on the legislature, this one for the 2012–2013 fiscal year. It includes this gem of a section (on pages 17–18) on proposed spending to let people download copies of bills:

During the hearings this year, the Committee heard testimony on the dissemination of congressional information products in Extensible Markup Language (XML) format. XML permits data to be reused and repurposed not only for print output but for conversion into ebooks, mobile web applications, and other forms of content delivery including data mashups and other analytical tools. The Committee has heard requests for the increased dissemination of congressional information via bulk data download from non-governmental groups supporting openness and transparency in the legislative process. While sharing these goals, the Committee is also concerned that Congress maintains the ability to ensure that its legislative data files remain intact and a trusted source once they are removed from the Government’s domain to private sites.

The GPO currently ensures the authenticity of the congressional information it disseminates to the public through its Federal Digital System and the Library Congress’s THOMAS system by the use of digital signature technology applied to the Portable Document Format (PDF) version of the document, which matches the printed document. The use of this technology attests that the digital version of the document has not been altered since it was authenticated and disseminated by GPO. At this time, only PDF files can be digitally signed in native format for authentication purposes. There currently is no comparable technology for the application and verification of digital signatures on XML documents. While the GPO currently provides bulk data access to information products of the Office of the Federal Register, the limitations on the authenticity and integrity of those data files are clearly spelled out in the user guide that accompanies those files on GPO’s Federal Digital System.

The GPO and Congress are moving toward the use of XML as the data standard for legislative information. The House and Senate are creating bills in XML format and are moving toward creating other congressional documents in XML for input to the GPO. At this point, however, the challenge of authenticating downloads of bulk data legislative data files in XML remains unresolved, and there continues to be a range of associated questions and issues: Which Legislative Branch agency would be the provider of bulk data downloads of legislative information in XML, and how would this service be authorized. How would “House” information be differentiated from “Senate” information for the purposes of bulk data downloads in XML? What would be the impact of bulk downloads of legislative data in XML on the timeliness and authoritativeness of congressional information? What would be the estimated timeline for the development of a system of authentication for bulk data downloads of legislative information in XML? What are the projected budgetary impacts of system development and implementation, including potential costs for support that may be required by third party users of legislative bulk data sets in XML, as well as any indirect costs, such as potential requirements for Congress to confirm or invalidate third party analyses of legislative data based on bulk downloads in XML? Are there other data models or alternative that can enhance congressional openness and transparency without relying on bulk data downloads in XML?

Accordingly, and before any bulk data downloads of legislative information are authorized, the Committee directs the establishment of a task force composed of staff representatives of the Library of Congress, the Congressional Research Service, the Clerk of the House, the Government Printing Office, and such other congressional offices as may be necessary, to examine these and any additional issues it considers relevant and to report back to the Committee on Appropriations of the House and Senate.

This is bullshit.* Either that or congress is relying on advisors who are simultaneously very smart and very stupid. What congress fears here is actually none of these things, but instead they are afraid of the fact that it is 2012. By not providing bulk downloads of legislation, they’re requiring that Josh Tabuerer keep scraping its text off their website to post at GovTrack.us, from which all of other other open congress websites get their text. If Josh wants to verify that the version of a bill that he has is accurate, he’s out of luck. There’s no master copy. For all technical purposes, congress is silent on what their bills say. (I have this same problem with Virginia legislation on Richmond Sunlight.) For Appropriations to argue that releasing legislation as XML presents potential problems with the accuracy of the circulated text is to pretend that a) there’s already a healthy ecosystem of unauthorized bulk congressional legislative data and b) that their failure to participate in that ecosystem is the source of any accuracy problems, and that by providing data themselves, then it becomes technologically trivial to verify the accuracy of a copy of a bill.

The Sunlight Foundation just posted a detailed rebuttal to the claims in this study, which go into more detail than I’m prepared to.

This is a real embarrassment, both to congress and to the United States. I’ve got a bit of experience in the federal data realm, and I can tell you that in the realm of open data, compared to the White House, Congress is trapped in the stone age. Now we see that they intend to stay there.

* Note that I am using a very specific definition of “bullshit”; in short, a false statement that both the speaker and the listener know to be untrue.

Opening up Virginia campaign finance data with Saberva.

With the Virginia State Board of Elections starting to provide bulk campaign finance data, a whole new world of data has opened up, and I intend to make the most of it.

Although the esteemed Virginia Public Access Project has long provided this information (laboriously cleaned up and displayed in a user-friendly fashion), it’s useful only to end users. There’s no API, no bulk downloads, etc., so it’s not possible for that data to be incorporated into Richmond Sunlight, Virginia Decoded, iOS apps, etc. That’s not a knock on VPAP—their line of business is providing this information to end users, period.

My normal instinct is to create a website that gathers and displays this data and, by the way, provides bulk downloads and an API. (For example, see Richmond Sunlight’s API guide and downloads directory, or Virginia Decoded’s downloads directory (the API is in alpha testing now).) But the website is, in this instance, unnecessary. VPAP is doing a better job of that than I can.

Instead, I intend to provide the tools for others to use this data. To that end, I’m developing Saberva, currently hosted on GitHub, a parser that gathers the data from the SBE’s servers, cleans it up, and exports it all to a MySQL database. (“Saber” as in Spanish for “to know,” and “VA” as in Virginia.) At first it’ll just be a program that anybody can run to get a big, beautiful pile of data, but I intend to provide bulk downloads (as MySQL and CSV) and an API (probably just as JSON). Slowing things down somewhat is the fact that I’m writing this in Python, a programming language that I know well enough to muck around in other people’s code, but not nearly well enough to write something of my own from scratch. This seems like the chance to learn it, and I think that Python is the right language for this project.

Awkwardly (for me), I’m learning this new language out in the open, on GitHub. GitHub, for those non-programmers, is a source code sharing website, for folks who, like me, develop software collaboratively. Every change that I make—every new line of code, every mistake—is chronicled on the project’s GitHub page. The tradeoff is that others can contribute to my code, making improvements or correcting my errors. Open government hacker Derek Willis has already forked Saberva, replacing and improving my laborious CSV parsing processes with Christopher Groskopf’s excellent csvkit.

Right now, Saberva will download the data for a single month (April), clean it up a bit, save a new CSV file, and create a file to allow it to be imported into a MySQL database. I’ve got the framework for something useful, and now it remains to be made genuinely useful.

If you’re handy with Python, and you know your way around Git, I hope you’ll consider lending a hand, even just cleaning up a few lines of code or adding a bit more functionality. Lord knows I could use the help.

Introducing Virginia Decoded.

Since it’s Sunshine Week, I figured I should stop obsessively polishing Virginia Decoded and just make it public. So here it is. What is it? Think Richmond Sunlight, but for the whole Code of Virginia, rather than just the bills proposed each year.

So why not use the official website for the code? Look at the state’s code website. Now back to me. Now back to the state’s code website. Now back to me. I’m on a horse.

You can find out more on the State Decoded website and the Virginia Decoded “About” page.

I’m speaking at tomorrow’s Jefferson Jackson event in Richmond.

I don’t normally mention my public speaking engagements, but tomorrow I’ve got one that’s free, open to the public, and liable to be of general interest. Tomorrow is the Democratic Party of Virginia’s Jefferson Jackson Dinner, the party’s big annual event. As a part of the day’s activities, I’m speaking on a panel with former (as of two days ago) U.S. CTO Aneesh Chopra, Peter Levin, and Macon Phillips. The topic is open technological innovation and its effects on politics and policy, which is my favorite topic to talk about. It’s at 1:30 PM at the Richmond Marriott.

(Incidentally, I think the women’s caucus should have their own event, focusing on reproductive rights, and call it the “VA JJ Dinner.” Hilarity ensues.)

I seem to have this website.

I publicly launched Richmond Sunlight five years ago this week. Upon its launch I gave it to the Virginia Interfaith Center for Public Policy because, as I wrote, “they’re non-partisan, they have an attention span longer than a housefly, and they have access to resources that I don’t.” I concluded: “I’ll run it for them for the next six months, while we train an editor and a webmaster to take it over. Then I can move on to my next project.” Richmond Sunlight is something that I want to exist, but not something that I actually want to be my problem. But nothing ever changed: the website was Virginia Interfaith’s in a legal sense (on a handshake deal), but in all other practical senses, it was mine. Every bit of the website was mine to run, from stem to stern…which was the opposite of my goal. It occupied enough of my time I couldn’t move onto that next project. In March, I informed the Virginia Interfaith Center that I had just worked my last session—they’d need to finally hire that webmaster. And I walked away from Richmond Sunlight, which is what enabled me to get started on Virginia Decoded. (Which is on hold briefly while I’m working for the White House.)

A few weeks ago, the Virginia Interfaith Center decided that they couldn’t operate Richmond Sunlight. The cost of paying somebody with the appropriate skill set would be too high and, besides, they’re between executive directors, and have more important things going on. So I asked for them to give it back, which they did cheerfully.

So I seem to have this website. Now I’m trying to figure out what to do with it. Giving it away hasn’t worked out, so now I need to chart a course that will allow it to grow and thrive, and also be financially sustainable.

Perhaps I could start a 501(c)3 to house Richmond Sunlight, Virginia Decoded, Open Virginia, and my other nascent efforts towards open government in Virginia? But then what—where does the money come from? I worry that advertising could make Richmond Sunlight appear disreputable. I think I could get some grants, but that’s ultimately not a business model. Maybe a few site sponsors (advertising lite), though I don’t know that anybody would be willing to pay enough to hire somebody to run the site during session. I do have a mostly completed “pro” version of Richmond Sunlight, but I’ve hesitated to launch it because I can’t provide the support that customers would deserve. (“Sorry it’s broken for you, but I’m at work now. And I’m busy tonight. How’s Saturday for you?”) While there’s a bit of a horse-and-cart problem there, the revenue from that could well make it possible to hire somebody to provide that support and also run the website. Perhaps there’s a partnership waiting to happen—some organization with whom the site could have a mutually beneficial relationship?

I’m soliciting ideas. What should I do with Richmond Sunlight? How do I ensure that it continues to exist, fulfills its potential, but doesn’t keep me from moving onto other projects?

“Your ideas are intriguing to me, and I wish to subscribe to your newsletter.”

There was a moment in an episode of The Simpsons (“Mountain of Madness”), that aired back in 1997, that I’ve mentally revisited every so often over the past decade:

In transcript form:

Homer: So, Mr. Burns is gonna make us all go on a stupid corporate retreat up in the mountains to learn about teamwork. Which means we’ll have to cancel our plans to hang around here.
Bart: Teamwork is overrated.
Homer: Huh?
Bart: Think about it. I mean, what team was Babe Ruth on? Who knows.
Lisa+Marge: Yankees.
Bart: Sharing is a bunch of bull, too. And helping others. And what’s all this crap I’ve been hearing about tolerance?
Homer: Hmm. Your ideas are intriguing to me, and I wish to subscribe to your newsletter.

In 1997, that last line was the joke: I wish to subscribe to your newsletter. Homer is such a dope that he thinks that he can subscribe to a newsletter from just one person, as if that would ever be practical for your average person, especially his own nine-year-old son.

Of course, that’s a thing now: Twitter, Facebook, and Google Plus. On Twitter, about 1,200 people “subscribe to my newsletter,” in its various guises (I have six Twitter feeds). At least a couple of times each week, I think “your ideas are intriguing to me, and I wish to subscribe to your newsletter,” and I hit “Follow” or “Subscribe” or “Add to Circles.” It was a punchline 14 years ago. Now it’s just part of our social fabric.