Robots

From Hostek.com Wiki
Revision as of 16:18, 16 July 2013 by Jakeh (Talk | contribs) (Created page with "==About Robots.txt Files== The '''robots.txt''' file represents an "exclusion standard" that is used to give directives to web robots (spiders, bots) that crawl Web sites to h...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

About Robots.txt Files

The robots.txt file represents an "exclusion standard" that is used to give directives to web robots (spiders, bots) that crawl Web sites to help improve search engine results. When a bot first begins to crawl a site it will check for the existence of a robots.txt file in the web root. If one exists, most bots will follow the directives within the file such as the "Crawl-delay" and excluded directories. If the bot does not find a robots.txt file though, it will assume the Webmaster wants the entire site crawled.

A downside to using robots.txt files is that some bots will not respect the directives in the file, but the major search engines such as Google, Bing, Yahoo, etc will obey the directives.

Examples

The main reason we recommend using a robots.txt file is to control the rate at which the website is being crawled which can help prevent a bot/spider from creating a massive number of database connections at the same time.

Crawl-delay

To implement such a crawl delay, insert the following code in your site's robots.txt (in your wwwroot or public_html folder):

User-agent: *
Crawl-delay: 5

You can adjust the crawl rate as desired, but we suggest nothing lower than 2 seconds.

Disallow Directories/Folders

If there are certain areas of your site you do not wish to have indexed, such as your site's administrative section or images folder, you can tell bots not to crawl such folders. To do this, add the following code to your robots.txt:

User-agent: *
Disallow: /admin
Disallow: /images

Exluding Specific Bots

If you wish to tell a specific bot to not crawl your site, you can do so with the following code:

User-agent: Baiduspider
Disallow: /

User-agent: Sosospider
Disallow: /

The above example will prevent the Baiduspider and Sosospider bots from crawling your site. To block other bots, just replace the User-agent name with the actual name of the bot you wish to block.

More Info About Using robots.txt

More info about using a robots.txt file can be found at the Wikipedia Robots Exclusion Standard page.