Jakeh: Created page with "==About Robots.txt Files== The '''robots.txt''' file represents an "exclusion standard" that is used to give directives to web robots (spiders, bots) that crawl Web sites to h..."

2013-07-16T16:18:21Z

Created page with "==About Robots.txt Files== The '''robots.txt''' file represents an "exclusion standard" that is used to give directives to web robots (spiders, bots) that crawl Web sites to h..."

New page

==About Robots.txt Files==
The '''robots.txt''' file represents an "exclusion standard" that is used to give directives to web robots (spiders, bots) that crawl Web sites to help improve search engine results. When a bot first begins to crawl a site it will check for the existence of a robots.txt file in the web root. If one exists, most bots will follow the directives within the file such as the "Crawl-delay" and excluded directories. If the bot does not find a robots.txt file though, it will assume the Webmaster wants the entire site crawled.

A downside to using robots.txt files is that some bots will not respect the directives in the file, but the major search engines such as Google, Bing, Yahoo, etc will obey the directives.

==Examples==
The main reason we recommend using a robots.txt file is to control the rate at which the website is being crawled which can help prevent a bot/spider from creating a massive number of database connections at the same time.
===Crawl-delay===
To implement such a crawl delay, insert the following code in your site's robots.txt (in your wwwroot or public_html folder):
<pre style="white-space: pre-wrap">
User-agent: *
Crawl-delay: 5
</pre>

You can adjust the crawl rate as desired, but we suggest nothing lower than 2 seconds.
===Disallow Directories/Folders===
If there are certain areas of your site you do not wish to have indexed, such as your site's administrative section or images folder, you can tell bots not to crawl such folders. To do this, add the following code to your robots.txt:
<pre style="white-space: pre-wrap">
User-agent: *
Disallow: /admin
Disallow: /images
</pre>
===Exluding Specific Bots===
If you wish to tell a specific bot to not crawl your site, you can do so with the following code:
<pre style="white-space: pre-wrap">
User-agent: Baiduspider
Disallow: /

User-agent: Sosospider
Disallow: /
</pre>

The above example will prevent the '''Baiduspider''' and '''Sosospider''' bots from crawling your site. To block other bots, just replace the User-agent name with the actual name of the bot you wish to block.
==More Info About Using robots.txt==
More info about using a '''robots.txt''' file can be found at the [http://en.wikipedia.org/wiki/Robots_exclusion_standard Wikipedia Robots Exclusion Standard] page.

Robots - Revision history

Jakeh: Created page with "==About Robots.txt Files== The '''robots.txt''' file represents an "exclusion standard" that is used to give directives to web robots (spiders, bots) that crawl Web sites to h..."