Waldo Jaquith

MSNbot can go screw.

I’ve had 9,000 hits on cvillenews.com today. 3,941 of those are from MSN’s crappy bot, which is downloading each and every comments RSS feed from WordPress over and over and over again. To the tune of 6.6MB of traffic today. It keeps overloading my server, which falls back into chill-out mode for a few minutes before starting up the web server again…only to be overwhelmed by MSNbot again.

I see on Technorati that a lot of people are having this problem. I’m just banning MSNbot from my network by filtering out their IP subnet. Screw ‘em. It’s a crappy search engine, anyhow.


4 Comments

robots.txt is your friend…

Posted by KMD on 25 August 2005 @ 9pm

They’re totally ignoring it. That’s the most infuriating part.

Posted by Waldo Jaquith on 25 August 2005 @ 10pm

Brutal… don’t w3 specs require them to follow robots.txt directives to be fully compliant?

Posted by KMD on 28 August 2005 @ 10pm

The Robot Exclusion Protocol was actually never formalized or accepted by any body. It just came out of the consensus on the robots mailing list in the mid-90s. There’s no enforcement mechanism, and really no updates to that initial “standard” since then.

MSN last attempted to spider the site 20 minutes ago. My .htaccess keeps them away, as if the robots.txt wasn’t a big enough hint.

Posted by Waldo Jaquith on 29 August 2005 @ 12am