$500 bounty for a speech transcription program.

The world needs an API to automatically generate transcript captions for videos. I am offering a $500 bounty for a program that does this via YouTube’s built-in machine transcription functionality. It should work in approximately this manner:

  1. Accepts a manifest that lists one or more video URLs and other metadata fields. The manifest may be in any common, reasonable format (e.g., JSON, CSV, XML).
  2. Retrieves the video from the URL and stores it on the filesystem.
  3. Uploads the video to YouTube, appending the other metadata fields to the request.
  4. Deletes the video from the filesystem.
  5. Downloads the resulting caption file, storing it with a unique name that can be connected back to a unique field contained within the manifest (e.g., a unique ID metadata field).


  • Must be written in a common, non-compiled language (e.g., Python, PHP, Perl, Ruby) that requires no special setup or server configuration that will run on any standard, out-of-the-box Linux distribution.
  • Must run at the command line. (It’s fine to provide additional interfaces.)
  • May have additional features and options.
  • May use existing open source components (of course). This is not a clean-room implementation.
  • May be divided into multiple programs (e.g., one to parse the manifest and retrieve the specified videos, one to submit the video to YouTube, and one to poll YouTube for the completed transcripts), or combined as one.
  • Must be licensed under the GPL, MIT, or Apache licenses. Other licenses may be considered.
  • If multiple parties develop the program collaboratively, it’s up to them to determine how to divide the bounty. If they cannot come to agreement within seven days, the bounty will be donated to the 501(c)3 of my choosing.
  • The first person to provide functioning code that meets the specifications will receive the bounty.
  • Anybody who delivers incomplete code, or who delivers complete code after somebody else has already done so, will receive a firm handshake and the thanks of a grateful nation.
  • If nobody delivers a completed product within 30 days then I may, within my discretion, award some or all of the bounty to whomever has gotten closest to completion.

Participants are encouraged to develop in the open, on GitHub, and to comment here with a link to their repository, so that others may observe their work, and perhaps join in.

This bounty is funded entirely by the 95 folks who backed this Kickstarter project, though I suppose especially by those people who kept backing the project even after the goal was met. I deserve zero credit for it.

Published by Waldo Jaquith

Waldo Jaquith (JAKE-with) is an open government technologist who lives near Char­lottes­­ville, VA, USA. more »

11 replies on “$500 bounty for a speech transcription program.”

  1. There might be a service doing this on a per video for a fee: http://www.subply.com

    I looked a little at the Youtube API and it seems that they don’t have a regularized way of requesting an automatic transcription of a video. I suspect it might be necessary to robot something that acts like a web browser.

  2. I was excited to give this a shot until your requirements said Linux and non-compiled. I’ve done enough pain-inducing programming in my life. Oh well, maybe I’ll do it my way anyway. Seems like an interesting idea. I’ll still open source it and let you have at it if I do work on it.

  3. I looked a little at the Youtube API and it seems that they don’t have a regularized way of requesting an automatic transcription of a video.

    I believe that they do, unless I misunderstand. It looks like they have a pretty rich API.

    I was excited to give this a shot until your requirements said Linux and non-compiled.

    Those requirements exist because all of my servers are Linux-based; something built for Mac OS X or Windows wouldn’t be of any use to me. (Also, the open government technology community is overwhelmingly Linux-centric, and I want this to be useful to them. Of course, there are plenty of other potential uses for this!) I prefer a non-complied language because it will make this so much easier for others to hack on, than something that requires re-compilation to modify. There’s obviously no technical reason why this couldn’t be done in [insert your langage of choice], but I think it will have greater utility in Python et al. Sorry!

  4. I understand the reasoning. Just not my cup of tea.

    Like I said, if I do look into it (I’ve already started perusing the documentation, because I do find the problem space interesting), I’ll open source it. So if you don’t get anyone who does it directly, it’d be easy for someone to port it to whatever language/platform they wanted.

  5. So, we’re seeing a lot of each other today. I saw this yesterday and it seemed like a fun challenge, so I took a stab last night:


    Transcription seems to happens asynchronously, sometimes many hours after uploading, so rather than keep the script running till the transcription is available, I wrote one python script to grab the videos and another, to be run later, to get the transcriptions. The one requirement I didn’t quite comply with is the unique-id requirement — instead, I’m saving the ID of the uploaded YouTube videos in a modified manifest and using that as the filename for transcriptions.

    Let me know what you think!

Comments are closed.