Using HAProxy to block 80legs (DDOS-tool|crawler)

Written by Alexandre De Dommelin Sun Jul 22 15:32:40 UTC 2012
Disclaimer : This post does only reflect my personal opinion and can't be associated to the opinion of the company I work for.

I've seen yesterday night something which could be described as a "DDOS attack" on one of the infrastructure I'm managing : during approx. 7h we've received a continuous huge amount of connections / HTTP requests coming from more than 1800 differents IP, mainly located in :

  • Russian Federation
  • Ukraine
This increase was brutal (not progressive, in 1-2 minutes), and disappeared the same way. A first analysis show that these connections were initiated by 80legs.com, a "distributed web-crawler" which allows anybody to Setup [your own] web crawl in minutes and run it on over 50,000+ computers, as we can see in the HTTP Headers :

User-Agent: Mozilla/5.0 (compatible; 008/0.83; http://www.80legs.com/webcrawler.html) Gecko/2008032620	

After digging into google, it seems that many people had the same experience with this crawler and that requesting rate-limiting was not successful. Moreover, some people also describe the fact that denying 80legs in the robots.txt was not sufficient to prevent them to crawl you. So, in this case I suggest you to put preventive rules either in your Web Application Firewall or in your load-balancer / webserver to prevent them reaching / overloading your web infrastructure.

Below an example of HAProxy configuration to tarpit all HTTP requests from this crawler :

frontend HTTP
	[...]
        #
        # Block all requests from 80legs
        #
        reqitarpit ^User-Agent:.*80legs*

Written by | Permanent link | File under: Tips