What is a robots.txt and how is it used?
The Robots Exclusion Standard or better know as robot.txt is a way to instruct search engine spiders that certain URLs on a site are off limits to them.
Sometimes it is necessary to make URLs off-limits to spiders. A webmaster may not want certain URLs indexed by a search engine; these could include URLs where sensitive information is stored like a backend admin.
Robots.txt is a standard .txt file that must named 'robot', and should always be uploaded to the root domain of a site. XML sitemaps have no tag for excluding URLs and robots.txt still plays an important part in SEO.
An example of a robots.txt could be this...
user-agent: *
disallow: /images/
disallow: /objects/
disallow: /templates/
User-agent is simply telling the search engine what is applicable to them; in the example above the wildcard is used meaning that all search engines must take heed of what is in the txt file. Disallow states what folder or URLs should not be search. In the example search engines must not index the image, objects or templates folders.


There are no comments for this entry.
[Add Comment] [Subscribe to Comments]