Duplicate bill checking.

Here’s a small new feature on Richmond Sunlight that I’m excited about: identical bill matching. Any bill that has a summary that is identical to any other bills’ summaries has links to all of the others. For instance, nine of the bills repealing abuser fees were identical. So viewing any one of them (i.e., SB1) provides links to all of the others. And any comment made on any one of those bills will appear on all of the identical bills. Once I add a date filter, this will automatically point out legislation sniping. All sorts of other uses come to mind, like pointing out that the same bill has been introduced before, calculating the frequency with which legislators introduce duplicate legislation, etc.

For the geeks among us: I fought with adding this little feature for weeks — the MySQL query of comparing all of the bill summaries was just too expensive. Then, last night, I had a forehead-slapping realization that this is what MD5 hashes are for. It just took two minutes to add, and the problem was solved.

Published by Waldo Jaquith

Waldo Jaquith (JAKE-with) is an open government technologist who lives near Char­lottes­­ville, VA, USA. more »

3 replies on “Duplicate bill checking.”

  1. Did (and if so, how) you just set it up to begin the hash at “be it enacted…?” Otherwise, the committee assignment and delegate information at the top of the data that RS gets from LIS would prevent duplicate hashes.

  2. It’s based only on the bill summary, not the bill itself. I wasn’t sure that would be good enough, but manually reviewing ~100 bills with various sets of matching summaries showed a 100% correlation. The reason that I don’t match the bills is for precisely the reason you describe.

Comments are closed.