In order to protect our customers privacy, copyright and the performance of our websites, we have a number of automatic and manua systems that actively monitor robots, spiders and crawlers accessing websites that we host, and block those that don't comply with a few common sense rules:
In general, all webcrawlers, robots and spiders will be blocked unless they:
-
Follow the robots exclusion standard
-
Properly identify themselves in their user-agent. Robots that misidentify themselves with a web-browser user agent string (eg ""Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111 Firefox/1.5.0.1") or no user agent string at all are generally assumed to be malicious spiders and blocked
-
Don't attempt to submit forms, crack passwords, engage in web application attacks or display an access pattern consistent with known malicous tools
-
Don't cause undue load to the website, eg by accessing pages more than twice a second