Sorry for the interruption in service. For about 30 minutes or so, FrontBurner was down. The reason? Bing. The Microsoft search engine has a bot that isn’t very bright. First, it’s working during daylight hours, when we humans are more likely to be using our bandwidth. Friendlier bots do their work during the night. Second, the Bing bot was deluging us with requests for FrontBurner pages that aren’t cached, sending up to 400 requests in 30 seconds. If those requests were for the homepage, no problem. It is cached and ready to serve up. But the requests were for category pages (the little blue link just to the left of the comments link at the end of each post), meaning each request set our servers to work generating pages that didn’t already exist, quickly overtaxing our humble machines.
That’s the story I’m sticking to, anyway.
4 comments
This is SO unnecessary and easily avoidable.
http://www.rackspace.com/cloud/public/servers/
So let’s say I want to buys some ad space on dmagazine.com and I want to know how much traffic your site receives. Can I trust your data if you allow automated traffic on your site?
@Capt, bots do not use javascript, and therefore are not counted by Google Analytics in our traffic numbers. And attempting to block all spammer IPs is generally considered a rabbit hole, since they will just have new ones next week.
@Randy is right. IP spoffing is very easy. To prevent bot access you need to check their signatures (for example: header content and order) and perform challenges (like JS challenge which, as @Randy said, bots can’t execute) Best bot protection is currently offered by Incapsula (http://www.incapsula.com) and they provide it for free.