Difference between revisions of "Robots.txt"

From Hostek.com Wiki
Jump to: navigation, search
(Created page with "===Robots.txt File=== We suggest creating a '''robots.txt''' file in the web root of the domain to address two issues. First to control the rate at which the website is being ...")
 
m (Robots.txt File)
 
Line 1: Line 1:
 
===Robots.txt File===
 
===Robots.txt File===
We suggest creating a '''robots.txt''' file in the web root of the domain to address two issues. First to control the rate at which the website is being crawled which can help prevent a bot/spider from creating a massive number of database connections at the same time. Second to prevent specific bots from crawling the website. We suggest the following defaults, however you might want to add or remove the user agents denied, and adjust the crawl rate but we suggest nothing lower than 3 seconds.
+
We suggest creating a '''robots.txt''' file in the web root of the domain to address two issues. First to control the rate at which the website is being crawled which can help prevent a bot/spider from creating a massive number of database connections at the same time. Second to prevent specific bots from crawling the website. We suggest the following defaults, however you might want to add or remove the user agents denied, and adjust the crawl rate.
 
<pre style="white-space: pre-wrap">
 
<pre style="white-space: pre-wrap">
 
User-agent: *
 
User-agent: *
Crawl-delay: 10
+
Crawl-delay: 2
  
 
User-agent: Baiduspider
 
User-agent: Baiduspider

Latest revision as of 13:41, 25 March 2015

Robots.txt File

We suggest creating a robots.txt file in the web root of the domain to address two issues. First to control the rate at which the website is being crawled which can help prevent a bot/spider from creating a massive number of database connections at the same time. Second to prevent specific bots from crawling the website. We suggest the following defaults, however you might want to add or remove the user agents denied, and adjust the crawl rate.

User-agent: *
Crawl-delay: 2

User-agent: Baiduspider
Disallow: /

User-agent: Sosospider
Disallow: /