The robots.txt file is a file you can add to your document root to help minimize the impact of "bots" crawling your site. While it is not a guarantee or foolproof method to manage how all bots crawl your site, it can help mitigate the impact of frequent crawls by certain search bots.
Limitations of robots.txt
An important point to keep in mind when using robots.txt is that it should be considered a set of suggestions, not a set of rules that all bots must follow. While most reputable bots will follow the instructions of a robots.txt file, scraper bots and malware bots (along with other bad bots) will most likely ignore your robots.txt file. So, if you have sensitive information that you wish to in your account, you will want to utilize password protection, not robots.txt to protect that data.
Creating a robots.txt File
You can create your robots.txt file either through the File Manager in cPanel or any text editor program on your local computer such as Notepad for Windows, TextEdit on Mac or vi or emacs on Linux. When saving the file, make sure it is saved with exactly the filename and extension robots.txt
What goes into a robots.txt File
You can specify Disallow rules for a specific User-agent (such as googlebot) or specify all bots by using the wild card "*" (without the quotes).
Here are a few examples:
This disallows all bots for the entire site:
While the following directive is considered nonstandard, several major crawlers support this as a directive to limit how often (in seconds) a bot should crawl your site:
For more information regarding User-agents and Disallow statements used in robots.txt, please see:
robots.txt documentation from robotstxt.org
More information can also be found about robots.txt in Goolge Webmaster Tools:
Google Webmaster Tools and robots.txt