Often the robots.txt file located in the root of the domain name is the ideal tool to communicate with search engines (Robots with Google, Yahoo and Microsoft's) to index web pages. However, in some cases, the file / robots.txt become limited and its use is very cumbersome and heavy.
META tag "robots" at the moment, it is useful even if its use is quite different from the robots.txt file
General concept
Before going to learn how the Robots META Tag, then we consider some concepts related to the index and follow links.
Index (index page)
"Indexing" or "index" of Web pages are unfortunately not defined in the Robots Exclusion Standard.
Some people believe that preventing the index corresponding to the page, in any case will not appear in search results and its contents, especially the link URL not be exploited by the crawler from the search engines.
However, there are some other ways less rigorous than the service. They argue that the ban on indexing Web pages is that search engines do not use the content of the page to determine the ranking of search results in the page's URL is matched by other factors, revenues compiled independently of the ban on this page.
Query link (follow URL link)
The term "query" link is easier to understand. As the path is interpreted as the search engines to ignore links that we find in the pages. It must not behave like that link in the page. And this path does not contribute to the level of "popularity" of pages that point to its link (URL link).
However it is clear that the path can be found on other sites and search engines will query.
About the Metadata tab Robots
META Tag is always located at the first position of the corresponding HTML source code of the page. This means that the HEAD tag and / HEADE.
META standard convention of "robots" in relation to the indexing of pages and query links we will review next.
With robots META tag you can specify how a crawler scans your web page. META tag contains a value:
all
Tells a crawler to index all (default).
none
Tells a crawler not indexing anything.
index
Indexed Web pages.
noindex
Not indexed pages, but the query URL.
follow
Crawler will read hypertext link in the page and query processing afterwards.
nofollow
Tells a crawler not analyze the link in the page.
NOARCHIVE
Not to search into the memory copy of Web pages.
nocache
Function as NOARCHIVE card but only for MSN / Live.
nosnippet
Not for crawler show description of sinppet in the search results page and not allow them to display in memory (cache or caching).
noodp
Prevent search engines from making the description description from the Web DMOZnhu contacts directory is part of the snippet in the search results page.
noydir
Prevent Yahoo from the deductions described in the Web directory Yahoo diectory to the description in the search results. Noydir value applies only to Yahoo and search engines do not have any other name you use Yahoo's Web by how this value is not supported for other search engines ..
For example:
In this example, as you thaytreen form; search engines will index pages and all other pages it finds the path in a given page.
Figure 1: Example of using Meta tags allow robots to index everything.
Search engines use Meta Tags Robots like?
As we've studied the value of the Robots Meta Tag, the following is a summary of the values supported and used by the most popular search engines:
Value Card Google Yahoo MSN Robots / Live Ask
index Yes No Yes Yes
noindex Yes Yes Yes Yes
none Yes Maybe Maybe Yes
nofollow Yes Yes Yes Yes
NOARCHIVE Yes Yes Yes Yes
nosnippet Yes No No No
noodp Yes Yes Yes No
noydir Do not Yes No Do not users
With the information in the summary table above, you can adjust the rights and restrictions for the search from the search engines to notice the name of the popular search engines, respectively, as follows:
Google
GoogleBot
Yahoo
Slurp
MSN / Live
MSNBOT
Ask
Teoma
Also here is some information from the search engines that you should refer to:
Meta Tag on Google's Webmaster Central Blog Blog
Webmaster permission of the Site on MSN / Live.
Support search on Yahoo's Webmaster Resources.
Webmaster Ask.com support for indexing Web pages.
Standard protocols using META Robots
General Convention
* Syntax: valeurs
* Maximum Characters: Not specified
Compatible: With all the search engines
* Version: HTML 2.0
* Location: Located between the card and
* Function: Allows you to specify how search engines index the page or prohibit some search engines if specified.
* Errors to Avoid: No, Meta tags are not necessarily required.
Interpretation
Part content = "valeurs" separated by a comma if the Robots META tag Tag value of more than one matter are: none, noindex, nofollow, all, index or follow.
1. none: Tells a crawler (Robots) to skip this page. Equivalent to noindex, nofollow.
* Noindex: This page is not indexed.
* Nofollow: Robots will not query the path found in the pages.
2. all: Do not restrict the index page or query the path found in the pages to identify sites to be indexed.
* Index: Robots can add this page in the search results.
* Follow: Robots may query the URL address to find other pages.
Notes
Conventional index, follow, or all do not need to specify because it is defined by default.
If not possible meta tag, or the contents of the can content is blank or not specified, the robots robots will be implied terms of the index, follow (the equivalent of all). If the keyword all is found in the declaration, it will ignore all other values. So all the value "nofollow, all, noindex, nofollow" becomes "all".
In the case of the opposite value (eg "follow, nofollow, follow"), the crawler will arbitrarily decide to scan your page.
Some common uses Robots metatag
Robots META tag Tag used to exclude content. Let's consider three examples use the Robots META Tag accurate information to exclude from indexing and search services.
Use noindex value to allow links to be queried whether the page is not indexed.
noindex ">
Using nofollow allows the page is indexed but not the path of the query page.
nofollow ">
Using none is equivalent to noindex, nofollow to ban both the indexing and query path.
none ">
If you want more detailed information on Robots META Tag is, please refer to the official site robotstxt.org.
Finally, as noted above, the combined use your robots.txt file and use the nofollow rel = "nofollow" (initiated by Google and be accepted by the other search engines). In addition, you should also note the use of Robots.txt for Google also has many options and separate points than the other search engines.