What is a robots.txt and how is it used?

The Robots Exclusion Standard or better know as robot.txt is a way to instruct search engine spiders that certain URLs on a site are off limits to them.

Sometimes it is necessary to make URLs off-limits to spiders. A webmaster may not want certain URLs indexed by a search engine; these could include URLs where sensitive information is stored like a backend admin.

Robots.txt is a standard .txt file that must named 'robot', and should always be uploaded to the root domain of a site. XML sitemaps have no tag for excluding URLs and robots.txt still plays an important part in SEO.

An example of a robots.txt could be this...


user-agent: *
disallow: /images/
disallow: /objects/
disallow: /templates/

User-agent is simply telling the search engine what is applicable to them; in the example above the wildcard is used meaning that all search engines must take heed of what is in the txt file. Disallow states what folder or URLs should not be search. In the example search engines must not index the image, objects or templates folders.


Related Blog Entries

Comments
 

About Me

Glyn Jackson, 26 years old, MD and youngest member of a web development firm based in Staffordshire called Newebia Ltd. Academic background in BSc Information System & Internet Commerce. Online marketing expert (EE Ranked) and .NET developer. Has been using ColdFusion for just 3 years but loves it. "I am not a veteran in ColdFusion but I do work on challenging projects which help me learn more about ColdFusion and if I can contribute to the community in anyway then, it's all good!"

Recommends

  • ColdFusion