RSH Web Services

hosting

cPanel FAQ's

What is robots.txt file and how to use it

General information

- Basics of robots.txt syntax

- Examples of usage

Robots.txt and SEO

- Removing exclusions of images

- Adding reference to your sitemap.xml file

- Miscellaneous remarks

Robots.txt - General information

Robots.txt is a text file located in the site's root directory that specifies for search engines' crawlers and spiders what website pages and files you want or don't want them to visit. Usually site owners strive to be noticed by search engines, but there are cases when it's not needed: for instance, if you store sensitive data or you want to save bandwidth by not indexing excluding heavy pages with images.

When a crawler accesses a site, he requests for a file named "/robots.txt" in the first place. If such a file is found, the crawler checks it for the website indexation instructions.

NOTE: there can be only one robots.txt file for the website. Robots.txt file for addon domain needs to be placed in the corresponding document root.

Google's official stance on the robots.txt file

Robots.txt file consists of lines which contain two fields: line with a user-agent name (search engine crawlers) and one or several lines starting with the directive

Disallow:

Robots.txt has to be created in UNIX text format.

Basics of robots.txt syntax

Usually robots.txt file contains something like this:

User-agent: *

Disallow: /cgi-bin/

Disallow: /tmp/ Disallow: /~different/

In this example three directories: "/cgi-bin/", "/tmp/" and "/~different/" are excluded from indexation.

NOTE: every directory is written on a separate line. You can't write «Disallow: /cgi-bin/ /tmp/» in one line, nor can you break up one directive Disallow or User-agent in several lines - use a new line to separate directives from each other.

«Star» (*) in User-agent field means "any web crawler". Consequently, directives of the type «Disallow: *.gif» or «User-agent: Mozilla*» are not supported - please pay attention to such logical mistakes as they are most common ones.

Other common mistakes are typos - misspelled directories, user-agents, missing colons after User-agent and Disallow, etc. When your robots.txt files get more and more complicated, and it's easy for an error to slip in, there are some validation tools that come in handy: http://tool.motoricerca.info/robots-checker.phtml

1997 - 2018  |  RSH Web Services  |  All Rights Reserved.

This Website Does Not Use Cookies    Nether Do We Collect Your Information