In Virginia, you can’t just get a list of all of the registered corporations. That’s not a thing. If you dig for a while on the State Corporation Commission’s website, you’ll find their “Business Entity Search,” where you can search for a business by name. But if you want to get a list of all businesses in your county, all businesses that have been formed in the past month, all businesses located at a particular address, etc., then you’re just out of luck.
Except. The SCC will sell you their database of all 1,126,069 companies. It’s not cheap, at $150/month, with a minimum three-month commitment. You have to sign a five-page contract, and the data is a hot mess, of no value to anybody other than a programmer.
So, naturally, I wrote the SCC a check for $450 at the end of April, bought the data, and now give it away for free. (Updated weekly, early Wednesday morning, I automatically transfer the enormous file to https://s3.amazonaws.com/virginia-business/current.zip.) Because it’s not right that people should have to pay for public data. The SCC is already generating this data, and they’re already hosting the file on their website—why sell it? We’ve already paid for it, out of our taxes and out of our business incorporation fees. I FOIAed the list of customers for this data. There are just six, so it’s not like this is a money-making endeavor for the SCC. (Only one of them, Attentive Law Group, is in Virginia.)
Now people can have this terrible file, useful only to programmers. So what are they to do with that file? Well, maybe nothing. So I’ve also written some software to turn that data into modern, useful formats. Named “Crump” (for Beverley T. Crump, the first-ever member of the State Corporation Commission), it is, naturally, free and open source. Crump turns the SCC’s fixed-width text file into JSON and CSV files. Optionally, it will clean up the data and produce Elasticsearch import files, basically allowing the data to be quickly loaded into a database and made searchable. Again, anybody can have the data for free, and anybody can have Crump for free, to turn that data into useful data.
And, finally, I’ve created a website, creatively named “Virginia Businesses,” where non-programmers can access that data and do things with it. I’ve barely gotten started on the website—at this point, one can download individual data files as either CSV or JSON, download the original data file from the SCC, or search through the data. The search results are terrible looking, and not all of the data is loaded in at the moment, but by the time you read this blog entry, perhaps that will all be much improved. I intend to add functionality to generate statistics, maps, charts, etc., to let people dig into this really interesting data. The website updates its data, automatically, every week. Naturally, the website itself is also an open source project—anybody can have the website, too, and can set up a duplicate to compete with me, or perhaps create a similar site for another state.
So, free data, free software, and a free website. There’s no catch.
OpenCorporates, whose excellent work inspired this project, has imported the data into their own system, meaning that Virginia’s corporate data is now available in a common system with 69 million other corporations from around the U.S. and the world.
Then, a couple of weeks ago, a happy surprise: the Shuttleworth Foundation e-mailed me, out of the blue, informing me that they’re giving me $5,000 to support my work in open data, as a part of their “flash grant” program. I can do whatever I want with that money, and I’m going to use a chunk of it to support this work. That means that I’m not out of pocket on that $450 check, and that I can continue to pay for this data for a while, so that others can continue to benefit from it.
I don’t know where this project is going—it’s just a hobby—but even if I stopped doing any more work on it tomorrow, I know I’d be leaving Virginians with much better business data than they had before.
In addition to the Shuttleworth Foundation, my thanks to the ACLU of Virginia and the EFF for providing me with legal advice, without which I couldn’t have even begun this project, and to Blue Ridge InternetWorks, who generously donates the website hosting and server power to crunch and distribute all of this data.