Waldo Jaquith

$500 bounty for a speech transcription program.

The world needs an API to automatically generate transcript captions for videos. I am offering a $500 bounty for a program that does this via YouTube’s built-in machine transcription functionality. It should work in approximately this manner:

  1. Accepts a manifest that lists one or more video URLs and other metadata fields. The manifest may be in any common, reasonable format (e.g., JSON, CSV, XML).
  2. Retrieves the video from the URL and stores it on the filesystem.
  3. Uploads the video to YouTube, appending the other metadata fields to the request.
  4. Deletes the video from the filesystem.
  5. Downloads the resulting caption file, storing it with a unique name that can be connected back to a unique field contained within the manifest (e.g., a unique ID metadata field).

Rules

Participants are encouraged to develop in the open, on GitHub, and to comment here with a link to their repository, so that others may observe their work, and perhaps join in.

This bounty is funded entirely by the 95 folks who backed this Kickstarter project, though I suppose especially by those people who kept backing the project even after the goal was met. I deserve zero credit for it.


11 Comments

There might be a service doing this on a per video for a fee: http://www.subply.com

I looked a little at the Youtube API and it seems that they don’t have a regularized way of requesting an automatic transcription of a video. I suspect it might be necessary to robot something that acts like a web browser.

Posted by Duane on 21 March 2013 @ 2pm

I was excited to give this a shot until your requirements said Linux and non-compiled. I’ve done enough pain-inducing programming in my life. Oh well, maybe I’ll do it my way anyway. Seems like an interesting idea. I’ll still open source it and let you have at it if I do work on it.

Posted by Tim on 21 March 2013 @ 2pm

I looked a little at the Youtube API and it seems that they don’t have a regularized way of requesting an automatic transcription of a video.

I believe that they do, unless I misunderstand. It looks like they have a pretty rich API.

I was excited to give this a shot until your requirements said Linux and non-compiled.

Those requirements exist because all of my servers are Linux-based; something built for Mac OS X or Windows wouldn’t be of any use to me. (Also, the open government technology community is overwhelmingly Linux-centric, and I want this to be useful to them. Of course, there are plenty of other potential uses for this!) I prefer a non-complied language because it will make this so much easier for others to hack on, than something that requires re-compilation to modify. There’s obviously no technical reason why this couldn’t be done in [insert your langage of choice], but I think it will have greater utility in Python et al. Sorry!

Posted by Waldo Jaquith on 21 March 2013 @ 3pm

I understand the reasoning. Just not my cup of tea.

Like I said, if I do look into it (I’ve already started perusing the documentation, because I do find the problem space interesting), I’ll open source it. So if you don’t get anyone who does it directly, it’d be easy for someone to port it to whatever language/platform they wanted.

Posted by Tim on 21 March 2013 @ 3pm

Great!

Posted by Waldo Jaquith on 21 March 2013 @ 3pm

So, we’re seeing a lot of each other today. I saw this yesterday and it seemed like a fun challenge, so I took a stab last night:

https://github.com/copiesofcopies/youtube-transcription

Transcription seems to happens asynchronously, sometimes many hours after uploading, so rather than keep the script running till the transcription is available, I wrote one python script to grab the videos and another, to be run later, to get the transcriptions. The one requirement I didn’t quite comply with is the unique-id requirement — instead, I’m saving the ID of the uploaded YouTube videos in a modified manifest and using that as the filename for transcriptions.

Let me know what you think!

Posted by Aaron Williamson on 22 March 2013 @ 4pm

Wow! I’ll step through it in a bit, and give it a try. Based on the README, it sure looks like you nailed it.

Posted by Waldo Jaquith on 22 March 2013 @ 7pm

Cool — it definitely needs more comments and much more error handling. I’m happy to add more of that once I’m sure I’m on the right track.

Posted by Aaron Williamson on 22 March 2013 @ 7pm

Mission accomplished—this works great! Bravo! I’ll be in touch about the bounty. :)

Posted by Waldo Jaquith on 23 March 2013 @ 11pm

[...] took just 27 hours for the $500 speech transcription bounty to be claimed. Aaron Williamson produced youtube-transcription, a Python-based pair of scripts that [...]

Posted by Waldo Jaquith - $500 speech transcription bounty claimed. on 23 March 2013 @ 11pm

Waldo Jaquith – $500 bounty for a speech transcription program.

Posted by GS test on 31 March 2013 @ 10am