Search engine unified Robots file standard

search engine three big fun, but occasionally co-operation. Last year, Google, YAHOO, and Microsoft worked together to comply with the unified Sitemaps standards. Three days ago, the big two also announced the common compliance with the robots.txt document standards. Google, YAHOO, and Microsoft each sent a post on their official blog, publishing standards for robots.txt files and Meta tags supported by three, as well as some of their own unique standards. Here is a summary.

The robots files supported by

three include:

Disallow – tell spiders not to grab certain files or directories. For example, the following code will prevent spiders from crawling all web files:

User-agent: *

Disallow: /

Allow – tell spiders to grab certain files. Allow and Disallow with the use of, you can tell a spider directory, most of them do not crawl, only grab part. If the following code will make spiders do not grab other files under the AB directory, and only grab the document under which CD:

User-agent: *

Disallow: /ab/

Allow: /ab/cd

$wildcard – matches the characters at the end of the URL. For example, the following code will allow spiders to access URL:

with the suffix ".Htm"

User-agent: *

Allow:.Htm$

* wildcards – tell spiders to match any character. For example, the following section of code will prevent spiders from grabbing all HTM files:

User-agent: *

Disallow: /*.htm

Sitemaps location – tell spider where your website map is, format:

Sitemap:

The Meta tags supported by

three include:

NOINDEX – tell spiders not to index a web page.

NOFOLLOW – tell the spider not to follow the link on the page.

NOSNIPPET – tell the spider not to show the caption in the search results.

NOARCHIVE – tell spiders not to display snapshots.

NOODP – tell the spider not to use the title and caption in the open directory.

above this >

Leave a Reply

Your email address will not be published. Required fields are marked *