From the U.S. House Committee on Appropriations comes their annual report on spending on the legislature, this one for the 2012–2013 fiscal year. It includes this gem of a section (on pages 17–18) on proposed spending to let people download copies of bills:
During the hearings this year, the Committee heard testimony on the dissemination of congressional information products in Extensible Markup Language (XML) format. XML permits data to be reused and repurposed not only for print output but for conversion into ebooks, mobile web applications, and other forms of content delivery including data mashups and other analytical tools. The Committee has heard requests for the increased dissemination of congressional information via bulk data download from non-governmental groups supporting openness and transparency in the legislative process. While sharing these goals, the Committee is also concerned that Congress maintains the ability to ensure that its legislative data files remain intact and a trusted source once they are removed from the Government’s domain to private sites.
The GPO currently ensures the authenticity of the congressional information it disseminates to the public through its Federal Digital System and the Library Congress’s THOMAS system by the use of digital signature technology applied to the Portable Document Format (PDF) version of the document, which matches the printed document. The use of this technology attests that the digital version of the document has not been altered since it was authenticated and disseminated by GPO. At this time, only PDF files can be digitally signed in native format for authentication purposes. There currently is no comparable technology for the application and verification of digital signatures on XML documents. While the GPO currently provides bulk data access to information products of the Office of the Federal Register, the limitations on the authenticity and integrity of those data files are clearly spelled out in the user guide that accompanies those files on GPO’s Federal Digital System.
The GPO and Congress are moving toward the use of XML as the data standard for legislative information. The House and Senate are creating bills in XML format and are moving toward creating other congressional documents in XML for input to the GPO. At this point, however, the challenge of authenticating downloads of bulk data legislative data files in XML remains unresolved, and there continues to be a range of associated questions and issues: Which Legislative Branch agency would be the provider of bulk data downloads of legislative information in XML, and how would this service be authorized. How would “House” information be differentiated from “Senate” information for the purposes of bulk data downloads in XML? What would be the impact of bulk downloads of legislative data in XML on the timeliness and authoritativeness of congressional information? What would be the estimated timeline for the development of a system of authentication for bulk data downloads of legislative information in XML? What are the projected budgetary impacts of system development and implementation, including potential costs for support that may be required by third party users of legislative bulk data sets in XML, as well as any indirect costs, such as potential requirements for Congress to confirm or invalidate third party analyses of legislative data based on bulk downloads in XML? Are there other data models or alternative that can enhance congressional openness and transparency without relying on bulk data downloads in XML?
Accordingly, and before any bulk data downloads of legislative information are authorized, the Committee directs the establishment of a task force composed of staff representatives of the Library of Congress, the Congressional Research Service, the Clerk of the House, the Government Printing Office, and such other congressional offices as may be necessary, to examine these and any additional issues it considers relevant and to report back to the Committee on Appropriations of the House and Senate.
This is bullshit.* Either that or congress is relying on advisors who are simultaneously very smart and very stupid. What congress fears here is actually none of these things, but instead they are afraid of the fact that it is 2012. By not providing bulk downloads of legislation, they’re requiring that Josh Tabuerer keep scraping its text off their website to post at GovTrack.us, from which all of other other open congress websites get their text. If Josh wants to verify that the version of a bill that he has is accurate, he’s out of luck. There’s no master copy. For all technical purposes, congress is silent on what their bills say. (I have this same problem with Virginia legislation on Richmond Sunlight.) For Appropriations to argue that releasing legislation as XML presents potential problems with the accuracy of the circulated text is to pretend that a) there’s already a healthy ecosystem of unauthorized bulk congressional legislative data and b) that their failure to participate in that ecosystem is the source of any accuracy problems, and that by providing data themselves, then it becomes technologically trivial to verify the accuracy of a copy of a bill.
The Sunlight Foundation just posted a detailed rebuttal to the claims in this study, which go into more detail than I’m prepared to.
This is a real embarrassment, both to congress and to the United States. I’ve got a bit of experience in the federal data realm, and I can tell you that in the realm of open data, compared to the White House, Congress is trapped in the stone age. Now we see that they intend to stay there.
* Note that I am using a very specific definition of “bullshit”; in short, a false statement that both the speaker and the listener know to be untrue.