<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Richmond Sunlight needs some money.</title>
	<atom:link href="http://waldo.jaquith.org/blog/2008/10/richmond-sunlight-plea/feed/" rel="self" type="application/rss+xml" />
	<link>http://waldo.jaquith.org/blog/2008/10/richmond-sunlight-plea/</link>
	<description></description>
	<lastBuildDate>Sat, 20 Mar 2010 17:56:51 -0700</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Waldo Jaquith</title>
		<link>http://waldo.jaquith.org/blog/2008/10/richmond-sunlight-plea/#comment-21370</link>
		<dc:creator>Waldo Jaquith</dc:creator>
		<pubDate>Sat, 18 Oct 2008 22:21:30 +0000</pubDate>
		<guid isPermaLink="false">http://waldo.jaquith.org/?p=6016#comment-21370</guid>
		<description>FWIW, neither Cuneform nor Ocrad are up to par. Ocrad is inferior to both gocr and Tesseract, and Cuneform is crude and sketchy. &lt;a href=&quot;http://silvercoders.com/?page=ocr_server&quot; rel=&quot;nofollow&quot;&gt;Silvercoders OCR&lt;/a&gt;, the lone commercial offering I can find, might be worth trying, but the fact that they don&#039;t list any prices on their website strikes me as a bad sign. I may be doomed to continue doing this on my desktop.</description>
		<content:encoded><![CDATA[<p>FWIW, neither Cuneform nor Ocrad are up to par. Ocrad is inferior to both gocr and Tesseract, and Cuneform is crude and sketchy. <a href="http://silvercoders.com/?page=ocr_server">Silvercoders OCR</a>, the lone commercial offering I can find, might be worth trying, but the fact that they don&#8217;t list any prices on their website strikes me as a bad sign. I may be doomed to continue doing this on my desktop.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: perlogik</title>
		<link>http://waldo.jaquith.org/blog/2008/10/richmond-sunlight-plea/#comment-21358</link>
		<dc:creator>perlogik</dc:creator>
		<pubDate>Fri, 17 Oct 2008 13:46:39 +0000</pubDate>
		<guid isPermaLink="false">http://waldo.jaquith.org/?p=6016#comment-21358</guid>
		<description>Halsey, I enjoy your comments here. How about sponsoring these noble goal or at least the MacPro. You got to know some people at Apple (of course Jobs might be upset at you- taking his bad boy of silicon valley title and all). I have never seen so much democracy availible for such a small price.
 
A better investment then most of the politicans you have probably given to. What do you say?</description>
		<content:encoded><![CDATA[<p>Halsey, I enjoy your comments here. How about sponsoring these noble goal or at least the MacPro. You got to know some people at Apple (of course Jobs might be upset at you- taking his bad boy of silicon valley title and all). I have never seen so much democracy availible for such a small price.</p>
<p>A better investment then most of the politicans you have probably given to. What do you say?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Waldo Jaquith</title>
		<link>http://waldo.jaquith.org/blog/2008/10/richmond-sunlight-plea/#comment-21355</link>
		<dc:creator>Waldo Jaquith</dc:creator>
		<pubDate>Fri, 17 Oct 2008 03:57:34 +0000</pubDate>
		<guid isPermaLink="false">http://waldo.jaquith.org/?p=6016#comment-21355</guid>
		<description>&lt;blockquote&gt;Waldo, I generally consider you something of a good natured, sharp tongued ass but I must compliment you on Richmond Sunlight. It is an an amazing accomplishment and a model for states everywhere. From reading all of your constant wining I would never have known you had something that big and profound in you.

Great Job. I love it.&lt;/blockquote&gt;*Laugh* Thanks, Halsey...I think. :) I guess you could say I&#039;ve earned my sharp-tongued whining. :)</description>
		<content:encoded><![CDATA[<blockquote><p>Waldo, I generally consider you something of a good natured, sharp tongued ass but I must compliment you on Richmond Sunlight. It is an an amazing accomplishment and a model for states everywhere. From reading all of your constant wining I would never have known you had something that big and profound in you.</p>
<p>Great Job. I love it.</p></blockquote>
<p>*Laugh* Thanks, Halsey&#8230;I think. :) I guess you could say I&#8217;ve earned my sharp-tongued whining. :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Waldo Jaquith</title>
		<link>http://waldo.jaquith.org/blog/2008/10/richmond-sunlight-plea/#comment-21354</link>
		<dc:creator>Waldo Jaquith</dc:creator>
		<pubDate>Fri, 17 Oct 2008 03:56:44 +0000</pubDate>
		<guid isPermaLink="false">http://waldo.jaquith.org/?p=6016#comment-21354</guid>
		<description>&lt;blockquote&gt;At work I have an 8-core Mac Pro with 10 gigs of Ram for video editing. I can attest that processing and rendering video is a snap. We ended up ordering the Mac from Apple with 2 gigs and saved tons of money by ordering the remaining 8 from a third party manufacturer.&lt;/blockquote&gt;Oooh, just as I&#039;d hoped. :) Ten gigs—my lord, that&#039;s a lot of memory (hence, good instinct going third party on those). May I some day have the chance to find out for myself how well those eight-cores crunch video. :)</description>
		<content:encoded><![CDATA[<blockquote><p>At work I have an 8-core Mac Pro with 10 gigs of Ram for video editing. I can attest that processing and rendering video is a snap. We ended up ordering the Mac from Apple with 2 gigs and saved tons of money by ordering the remaining 8 from a third party manufacturer.</p></blockquote>
<p>Oooh, just as I&#8217;d hoped. :) Ten gigs—my lord, that&#8217;s a lot of memory (hence, good instinct going third party on those). May I some day have the chance to find out for myself how well those eight-cores crunch video. :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Halsey Minor</title>
		<link>http://waldo.jaquith.org/blog/2008/10/richmond-sunlight-plea/#comment-21353</link>
		<dc:creator>Halsey Minor</dc:creator>
		<pubDate>Fri, 17 Oct 2008 03:54:53 +0000</pubDate>
		<guid isPermaLink="false">http://waldo.jaquith.org/?p=6016#comment-21353</guid>
		<description>Waldo, I generally consider you something of a good natured, sharp tongued ass but I must compliment you on Richmond Sunlight.  It is an an amazing accomplishment and a model for states everywhere.  From reading all of your constant wining I would never have known you had something that big and profound in you.

Great Job.  I love it.</description>
		<content:encoded><![CDATA[<p>Waldo, I generally consider you something of a good natured, sharp tongued ass but I must compliment you on Richmond Sunlight.  It is an an amazing accomplishment and a model for states everywhere.  From reading all of your constant wining I would never have known you had something that big and profound in you.</p>
<p>Great Job.  I love it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nathan</title>
		<link>http://waldo.jaquith.org/blog/2008/10/richmond-sunlight-plea/#comment-21352</link>
		<dc:creator>Nathan</dc:creator>
		<pubDate>Fri, 17 Oct 2008 03:29:48 +0000</pubDate>
		<guid isPermaLink="false">http://waldo.jaquith.org/?p=6016#comment-21352</guid>
		<description>At work I have an 8-core Mac Pro with 10 gigs of Ram for video editing.  I can attest that processing and rendering video is a snap.  We ended up ordering the Mac from Apple with 2 gigs and saved tons of money by ordering the remaining 8 from a third party manufacturer.

Good luck.</description>
		<content:encoded><![CDATA[<p>At work I have an 8-core Mac Pro with 10 gigs of Ram for video editing.  I can attest that processing and rendering video is a snap.  We ended up ordering the Mac from Apple with 2 gigs and saved tons of money by ordering the remaining 8 from a third party manufacturer.</p>
<p>Good luck.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Waldo Jaquith</title>
		<link>http://waldo.jaquith.org/blog/2008/10/richmond-sunlight-plea/#comment-21351</link>
		<dc:creator>Waldo Jaquith</dc:creator>
		<pubDate>Fri, 17 Oct 2008 02:31:29 +0000</pubDate>
		<guid isPermaLink="false">http://waldo.jaquith.org/?p=6016#comment-21351</guid>
		<description>&lt;blockquote&gt;You mention that EC2 won’t work because (in part) the files involved are too large — have you considered contacting Amazon AWS about this issue? I’m sure they could come up with some very interesting solutions. :-)

Of course, the OCR is the real problem&lt;/blockquote&gt;

I&#039;ve &lt;em&gt;never&lt;/em&gt; gotten a response from an e-mail to Amazon—I don&#039;t think they&#039;re likely to start now. :) If I could solve the OCR problem (I do that under Parallels in Windows right now, using &lt;a href=&quot;http://www.abbyy.com/&quot; rel=&quot;nofollow&quot;&gt;Abbyy FineReader&lt;/a&gt;), then it might actually be worth the upload time—then I could do everything, end-to-end, in EC2. Where I live, I can get a 10MB ADSL, but the price is another $30/month over my existing 1.5MBps connection. But now that I check, I see that only gets me 896kbps upload, which is almost twice as fast as my theoretical upload speed now, but it&#039;s not real hot.

&lt;blockquote&gt;However, I recall that the New York Times used Amazon’s EC2 to OCR all their old pre-electronic archives. I wonder what they used for that?&lt;/blockquote&gt;

I wondered the same thing when I read about that. I never did find out. It was Derek Gottfrid who headed that up for the &lt;em&gt;Times&lt;/em&gt;—I&#039;ll e-mail him and ask him, if I can manage to track down his address. Thanks for the suggestion!

FWIW, the stuff I&#039;m trying to OCR is pretty straightforward. A screen capture might look like this:

&lt;img src=&quot;/blog/wp-content/uploads/2008/10/79.jpg&quot; width=&quot;500&quot; height=&quot;375&quot; /&gt;

And in this case, there&#039;s no bill number, but there is a legislator name, which I detect (thanks to the shade of blue), hack out, and increase the contrast like such:

&lt;img src=&quot;/blog/wp-content/uploads/2008/10/79-name.gif&quot; width=&quot;304&quot; height=&quot;39&quot; /&gt;

As OCRing goes, that&#039;s really not bad. But you can see why it&#039;s important that I use large, high-quality video: the bigger the text, and the less noise around it, the easier it is to OCR it. Googling around this evening, I discovered &lt;a href=&quot;http://www.cuneiform.ru/eng/index.html&quot; rel=&quot;nofollow&quot;&gt;Cuneiform&lt;/a&gt; and &lt;a href=&quot;http://www.gnu.org/software/ocrad/ocrad.html&quot; rel=&quot;nofollow&quot;&gt;Ocrad&lt;/a&gt;, two Linux OCR packages that I don&#039;t recall seeing before. I&#039;ll have to give those a whirl.

It&#039;s this process that brings about my two biggest technical obstacles. The first is the size of the video, and the second is the processing time. DreamHost just chokes on the amount of processing that&#039;s required to do all of these transformations, and they just kill the process, no matter how fiercely I renice it. Which is why I need to do all of this on the desktop or, better still, offload it to EC2. EC2 or no, though, I&#039;ll never get around the need to locally rip video, and the quality of video generated by QuickTime (rather than mencoder) is so much higher that I really need to use it to generate the MP4 and the MOV.</description>
		<content:encoded><![CDATA[<blockquote><p>You mention that EC2 won’t work because (in part) the files involved are too large — have you considered contacting Amazon AWS about this issue? I’m sure they could come up with some very interesting solutions. :-)</p>
<p>Of course, the OCR is the real problem</p></blockquote>
<p>I&#8217;ve <em>never</em> gotten a response from an e-mail to Amazon—I don&#8217;t think they&#8217;re likely to start now. :) If I could solve the OCR problem (I do that under Parallels in Windows right now, using <a href="http://www.abbyy.com/">Abbyy FineReader</a>), then it might actually be worth the upload time—then I could do everything, end-to-end, in EC2. Where I live, I can get a 10MB ADSL, but the price is another $30/month over my existing 1.5MBps connection. But now that I check, I see that only gets me 896kbps upload, which is almost twice as fast as my theoretical upload speed now, but it&#8217;s not real hot.</p>
<blockquote><p>However, I recall that the New York Times used Amazon’s EC2 to OCR all their old pre-electronic archives. I wonder what they used for that?</p></blockquote>
<p>I wondered the same thing when I read about that. I never did find out. It was Derek Gottfrid who headed that up for the <em>Times</em>—I&#8217;ll e-mail him and ask him, if I can manage to track down his address. Thanks for the suggestion!</p>
<p>FWIW, the stuff I&#8217;m trying to OCR is pretty straightforward. A screen capture might look like this:</p>
<p><img src="/blog/wp-content/uploads/2008/10/79.jpg" width="500" height="375" /></p>
<p>And in this case, there&#8217;s no bill number, but there is a legislator name, which I detect (thanks to the shade of blue), hack out, and increase the contrast like such:</p>
<p><img src="/blog/wp-content/uploads/2008/10/79-name.gif" width="304" height="39" /></p>
<p>As OCRing goes, that&#8217;s really not bad. But you can see why it&#8217;s important that I use large, high-quality video: the bigger the text, and the less noise around it, the easier it is to OCR it. Googling around this evening, I discovered <a href="http://www.cuneiform.ru/eng/index.html">Cuneiform</a> and <a href="http://www.gnu.org/software/ocrad/ocrad.html">Ocrad</a>, two Linux OCR packages that I don&#8217;t recall seeing before. I&#8217;ll have to give those a whirl.</p>
<p>It&#8217;s this process that brings about my two biggest technical obstacles. The first is the size of the video, and the second is the processing time. DreamHost just chokes on the amount of processing that&#8217;s required to do all of these transformations, and they just kill the process, no matter how fiercely I renice it. Which is why I need to do all of this on the desktop or, better still, offload it to EC2. EC2 or no, though, I&#8217;ll never get around the need to locally rip video, and the quality of video generated by QuickTime (rather than mencoder) is so much higher that I really need to use it to generate the MP4 and the MOV.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tim McCormack</title>
		<link>http://waldo.jaquith.org/blog/2008/10/richmond-sunlight-plea/#comment-21350</link>
		<dc:creator>Tim McCormack</dc:creator>
		<pubDate>Fri, 17 Oct 2008 01:46:02 +0000</pubDate>
		<guid isPermaLink="false">http://waldo.jaquith.org/?p=6016#comment-21350</guid>
		<description>You mention that EC2 won&#039;t work because (in part) the files involved are too large -- have you considered contacting Amazon AWS about this issue?  I&#039;m sure they could come up with some very interesting solutions. :-)

Of course, the OCR is the real problem. However, I recall that the New York Times used Amazon&#039;s EC2 to OCR all their old pre-electronic archives. I wonder what they used for that?</description>
		<content:encoded><![CDATA[<p>You mention that EC2 won&#8217;t work because (in part) the files involved are too large &#8212; have you considered contacting Amazon AWS about this issue?  I&#8217;m sure they could come up with some very interesting solutions. :-)</p>
<p>Of course, the OCR is the real problem. However, I recall that the New York Times used Amazon&#8217;s EC2 to OCR all their old pre-electronic archives. I wonder what they used for that?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
