What is the "robots.txt" File

cPanel How-To-DoTutorials

Frequently Asked Questions

Updated: April 4, 2019
By: RSH Web Editorial Staff
cpanel robots.txt faq

cPanel

F A Q's

What is a "robots.txt" file and how to use it

General information

A "robots.txt" text file is basically just a simple text file made with any text editor such as NotePad
It is located in the Websites root directory that specifies for the Search Engine's Crawlers and Spiders or Bots what website pages and files you want or do not want them to crawl or index. Usually website owners strive to be noticed by search engines, but there are cases when it's not needed or wanted. For instance, if you store sensitive data or you want to save bandwidth by not indexing heavy pages with images

When a crawler accesses a website, it requests for a file named "/robots.txt". If such a file is found, the crawler checks it for the website indexation instructions

NOTE: there can be only one robots.txt file for the website. Robots.txt file for addon domains or sub-domains need to be placed in the corresponding document root

How to create a "robots.txt" file

The robots txt file is created in your web-site's root folder "yourwebsite.com/robot.txt"
You can use any text editor to make or edit a robots text file

The basic syntax for the robots txt file
>> User-agent: [The name of the robot for which you are writing these rules]
>> Disallow: [page, folder or path where you want to hide]
>> Allow: [page, folder or path where you want to unhide]

Example 1

If you want to allow crawl everything, then use this code (All Search Engine)
>> User-agent: *
>> Disallow:

Example 2

If you want to Disallow to crawl everything (All search Engine)
>> User-agent: *
>> Disallow: /

Example 3

If you want to Disallow for the specific folder (All search Engine)
>> User-agent: *
>> Disallow: /folder name/

Example 4

If you want to Disallow for the specific file (All search Engine)
>> User-agent: *
>> Disallow: /filename.html

Example 5

If you want to Disallow for a folder but allow the crawling of one file in that folder (All search Engine)
>> User-agent: *
>> Disallow: /folderxyz/
>> Allow: /folderxyz/anyfile.html

Example 6

Allow only one specific robot access in website
>> User-agent: *
>> Disallow: /
>> User-agent: Googlebot
>> Disallow:

Example 7

To exclude a single robot
>> User-agent: BadBotName
>> Disallow: /

Example 8

If you want to allow for the sitemap file crawling
>> User-agent: *
>> Sitemap: http://www.yourwebsite.com/sitemap.xml

Example 9

WordPress Robots txt File
>> User-agent: *
>> Disallow: /feed/
>> Disallow: /trackback/
>> Disallow: /wp-admin/
>> Disallow: /wp-content/
>> Disallow: /wp-includes/
>> Disallow: /readme.html
>> Disallow: /xmlrpc.php
>> Allow: /wp-content/uploads/
>> Sitemap: https://yourwebsite.com/sitemap.xml

Tip – Do not disallow files in the robots txt file that you want Bots to crawl or especially to hide, By doing this you are telling everyone about those files, We would recommend putting them inside a folder and Hide that folder

Other common mistakes are typos - misspelled directories, user-agents, missing colons after User-agent and Disallow, etc. When your robots.txt files get more and more complicated, and it's easy for an error to slip in, there are some validation tools that come in handy: http://tool.motoricerca.info/robots-checker.phtml

What is cPanel Website Hosting
Do I Need a Website Redesign?
Why Web Security is Important


Tweet  Share  Pin  Tumble  Email

Simple, Fast and Secure cPanel Hosting

45 Day Unconditional Guarantee  On all Hosting Packages  No questions asked

 

1997 - 2019  |  RSH Web Services  |  All Rights Reserved.