How do I control Search Engines using the robots.txt file ?

How do I control Search Engines using the robots.txt file ?

The spiders used to retreive website data for all legitimate search engines follow certain defined rules in a file called 'robots.txt', which should be placed in the root directory of a web site.

This file contains instructions about what a spider can and cannot follow and index within the sites structure, and therefore which directories / pages / images etc. that can be retrieved and indexed.

Example:

User-agent: *
Disallow: /cgi-bin/
Disallow: /jscript/
Disallow: /beta/
Disallow: /images/
Disallow: bogus.htm
This robots.txt file explains to the spiders that ...
  • User-agent: *
    all search engines are welcome to collect data from the site
  • Disallow: /cgi-bin/
    certain directories (and all the files/pages within) are to be collected for indexing
  • Disallow: bogus.htm
    the page bogus.htm in the site root should similarly not be retrieved and included.
  • 3 Users Found This Useful
Was this answer helpful?

Related Articles

How do I use SSI (server Side Includes) ?

Web pages that use SSI (server side includes) by convention should be named with the .shtml file...

What types of scripting do you support ? What scripts can I use ?

Astutium provides a comprehensive ramge of hosting services, offering every kind of operating...

What are the Web-Safe colours ?

Some years ago, when computers supported max 256 different colors, a list of 216 "Web Safe...

Cannot publish site with Frontpage Extensions, front page reporting incorrect password

Problem: I can no longer publish a site with Frontpage Extensions it says I have the wrong...

OSCommerce FAQ: Warning: I am able to write to the configuration file configure.php

OSCommerce FAQ: Warning: I am able to write to the configuration file Sometimes on a new...

Powered by WHMCompleteSolution