PDA

View Full Version : Watch your robots.txt file or risk banishment from google



coast
11-25-2007, 06:04 PM
I just got this from one of my SEO contacts, and talk to your developer about it to make sure you don't get de-indexed:

"Google has been considering new syntax to recognize within
robots.txt. The Sebastians-Pamphlets blog said Google confirmed
recognizing experimental syntax like Noindex in the robots.txt
file.

This poses a danger to webmasters who have not validated their
robots.txt. A line reading Noindex: / could lead to one's site
being completely de-indexed.

The surname-less Sebastian recommended Google's robots.txt
analyzer, part of Google's Webmaster Tools, and only using
the Disallow, Allow, and Sitemaps crawler directives in the
Googlebot section of robots.txt."

mejcdj
11-26-2007, 12:25 AM
Good info,.
Thanks Coast!

Scandiman
11-26-2007, 11:22 PM
Good info Coast, thanks :)

Reped ya!

coast
11-26-2007, 11:36 PM
Good info Coast, thanks :)

Reped ya!

Thanks Scandi! I'm a two-blocker now lol.

Ehsan
12-06-2007, 10:32 PM
Why is google always trying to be the Big brother it seems like the internet is not free anymore it seems even online we have to rely on google to see whats god for our site and whats not

coast
12-06-2007, 11:21 PM
Why is google always trying to be the Big brother it seems like the internet is not free anymore it seems even online we have to rely on google to see whats god for our site and whats not

Ehsan, in the case of the robots.txt issue, it's a simple case of Google robots following a command in a different way that the site owner intends. The problem is that some people are including the term " deindex" or "noindex" in their robots.txt file, which to (I think) Microsoft means "don't index this page," but instead Google is interpreting the code command to be "don't index this site." If a site isn't indexed by Google, it is left out of all of the results page. If you don't want your site indexed in Google (and some people do exclude Google on purpose) none of what I wrote matters.

Javier Marti
12-06-2007, 11:21 PM
Ehsan. With all due respect, it doesn't matter if the game is rigged, and it is out of our reach to fix it. We either choose to play it or not. Life is unfair. It will always be so. Don't worry too much about it. Just adapt. As Bruce Lee said,

http://www.joelledlow.com/pb/wp_c7619964/images/img194104520bcea04459.jpg

"Be like water my friend...and set up your mobis accordingly"

brysonmeunier
02-06-2008, 05:02 PM
Good info, Coast. You can also ensure your site doesn't get indexed by adding these lines:
User-agent: *
Disallow: /to your robots.txt. It's not specific to mobile, but it will hurt you either way. It's also a common mistake, so be careful.

If you're interested in the robots.txt file and how it can be used to exclude content from the index, robotstxt.org (http://www.robotstxt.org/) is the best resource I know.

coast
02-06-2008, 08:36 PM
Thanks Bryson. Repped.

texasgamer
03-11-2008, 08:25 AM
I just checked my robots.txt - Is this ok?

User-Agent: *
Allow: /
Disallow: /cgi-bin

brysonmeunier
03-12-2008, 08:28 PM
I just checked my robots.txt - Is this ok?

User-Agent: *
Allow: /
Disallow: /cgi-bin

Allow actually isn't allowed in the robots.txt file (http://www.robotstxt.org/robotstxt.html). They may just ignore it, but from what I've seen you're better off removing it. You don't want it misinterpreted as Disallow: / and having all of your content removed from the index.

Try this instead:
User-Agent: *
Disallow: /cgi-bin
Sitemap:[http://yourdomain.com/sitemap.xml] (http://sitemaps.org)

Best,
Bryson

texasgamer
03-13-2008, 01:15 AM
Thanks for the advice Bryson
Rep given :)