Jump to content

Recommended Posts

Yeah - you probably saw something that looked like:

 

*Google-Bot

 

They can't be clicked, and they have an asterisk next to them. Google-Bot, Archive.org, and there are a few others. They basically come to the forums, and do the equivalent of clicking on every link. That is how they index pages.

Share this post


Link to post
Share on other sites

what everyone has said is right, in fact, i did a seach for my screenname the other day and a few hits came up from this site. It's just indexing for google search purposes, or some other search engine.

Share this post


Link to post
Share on other sites

sorry to have brought this back, but it appears the spider bot from google likes to make his visits around 2am. The past 2 days it's been here at 2am... I wonder why.

Share this post


Link to post
Share on other sites
sorry to have brought this back, but it appears the spider bot from google likes to make his visits around 2am. The past 2 days it's been here at 2am... I wonder why.

It's a bot, it doesn't sleep :P

Share this post


Link to post
Share on other sites

Here is the info about the Bot.

 

 

 

 

1. How often will Googlebot access my web pages?

 

For most sites, Googlebot shouldn't access your site more than once every few seconds on average. However, due to network delays, it's possible that the rate will appear to be slightly higher over short periods.

 

2. How do I request that Google not crawl parts or all of my site?

 

robots.txt is a standard document that can tell Googlebot not to download some or all information from your web server. The format of the robots.txt file is specified in the Robot Exclusion Standard. For detailed instructions about how to prevent Googlebot from crawling all or part of your site, please refer to our Removals page. Remember, changes to your server's robots.txt file won't be immediately reflected in Google; they'll be discovered and used when Googlebot next crawls your site.

 

3. Googlebot is crawling my site too fast. What can I do?

 

Please contact us with the URL of your site and a detailed description of the problem. Please also include a portion of the weblog that shows Google accesses so we can track down the problem quickly.

 

4. Why is Googlebot asking for a file called robots.txt that isn't on my server?

 

robots.txt is a standard document that can tell Googlebot not to download some or all information from your web server. For information on how to create a robots.txt file, see The Robot Exclusion Standard. If you just want to prevent the "file not found" error messages in your web server log, you can create an empty file named robots.txt.

 

5. Why is Googlebot trying to download incorrect links from my server? Or from a server that doesn't exist?

 

It's a given that many links on the web will be broken or outdated at any particular time. Whenever someone publishes an incorrect link to your site (perhaps due to a typo or spelling error) or fails to update links to reflect changes in your server, Googlebot will try to download an incorrect link from your site. This also explains why you may get hits on a machine that's not even a web server.

 

6. Why is Googlebot downloading information from our "secret" web server?

 

It's almost impossible to keep a web server secret by not publishing any links to it. As soon as someone follows a link from your "secret" server to another web server, your "secret" URL may appear in the referrer tag and can be stored and published by the other web server in its referrer log. So, if there's a link to your "secret" web server or page on the web anywhere, it's likely that Googlebot and other web crawlers will find it.

 

7. Why isn't Googlebot obeying my robots.txt file?

 

To save bandwidth, Googlebot only downloads the robots.txt file once a day or whenever we've fetched many pages from the server. So, it may take a while for Googlebot to learn of changes to your robots.txt file. Also, Googlebot is distributed on several machines. Each of these keeps its own record of your robots.txt file.

 

We always suggest verifying that your syntax is correct against the standard at http://www.robotstxt.org/wc/exclusion.html#robotstxt. A common source of problems is that the robots.txt file isn't placed in the top directory of the server (e.g., www.myhost.com/robots.txt); placing the file in a subdirectory won't have any effect.

 

Also, there's a small difference between the way Googlebot handles the robots.txt file and the way the robots.txt standard says we should (keeping in mind the distinction between "should" and "must"). The standard says we should obey the first applicable rule, whereas Googlebot obeys the longest (that is, the most specific) applicable rule. This more intuitive practice matches what people actually do, and what they expect us to do. For example, consider the following robots.txt file:

 

User-Agent: *

Allow: /

Disallow: /cgi-bin

It's obvious that the webmaster's intent here is to allow robots to crawl everything except the /cgi-bin directory. Consequently, that's what we do.

 

For more information, please see the Robots FAQ. If there still seems to be a problem, please let us know.

 

8. Why are there hits from multiple machines at Google.com, all with user-agent Googlebot?

 

Googlebot was designed to be distributed on several machines to improve performance and scale as the web grows. Also, to cut down on bandwidth usage, we run many crawlers on machines located near the sites they're indexing in the network.

 

9. Can you tell me the IP addresses from which Googlebot crawls so that I can filter my logs?

 

The IP addresses used by Googlebot change from time to time. The best way to identify accesses by Googlebot is to use the user-agent (Googlebot).

 

10. Why is Googlebot downloading the same page on my site multiple times?

 

In general, Googlebot should only download one copy of each file from your site during a given crawl. Very occasionally the crawler is stopped and restarted, which may cause it to recrawl pages that it's recently retrieved.

 

11. Why don't the pages of my site that Googlebot crawled show up in your index?

 

Don't be alarmed if you can't immediately find documents that Googlebot has crawled in the Google search engine. Documents are entered into our index soon after being crawled. Occasionally, documents fetched by Googlebot won't be included for various reasons (e.g. they appear to be duplicates of other pages on the web).

 

12. What kinds of links does Googlebot follow?

 

Googlebot follows HREF links and SRC links.

 

13. How do I prevent Googlebot from following links on my pages?

 

To keep Googlebot from following links on your pages to other pages or documents, you'd place the following meta tag in the head of your HTML document:

 

<META NAME="Googlebot" CONTENT="nofollow">

 

To learn more about meta tags, please refer to http://www.robotstxt.org/wc/exclusion.html#meta; you can also read what the HTML standard has to say about these tags. Remember, changes to your site won't be immediately reflected in Google; they'll be discovered and propagate when Googlebot next crawls your site.

 

14. How do I tell Googlebot not to crawl a single outgoing link on a page?

 

Meta tags can exclude all outgoing links on a page, but you can also instruct Googlebot not to crawl individual links by adding rel="nofollow" to a hyperlink. When Google sees the attribute rel="nofollow" on hyperlinks, those links won't get any credit when we rank websites in our search results. For example a link,

 

<a href=http://www.example.com/>This is a great link!</a>

 

could be replaced with

 

<a href=http://www.example.com/ rel="nofollow">I can't vouch for this link</a>.

 

15. What is Feedfetcher, and why is it ignoring my robots.txt file?

 

Feedfetcher requests come from explicit action by human users. When users add your feed to their Google homepage or to Google Reader, Google's Feedfetcher attempts to obtain the content of the feed in order to display it. Since all requests come from humans, Feedfetcher has been designed to ignore robots.txt. Learn more.

 

16. How do I add my feed to the search results for Google's personalized homepage and Google Reader?

 

The feeds that Googlebot crawls appear in the search results for Google's personalized homepage and Google Reader. To ensure that your feed is part of this index, add a <link> tag to the header of your webpage to enable feed autodiscovery. There are a lot of variations on <link> tags for this purpose, but below are a couple simple examples.

 

For an Atom feed:

<link rel="alternate" type="application/atom+xml" title="Your Feed Title" href="http://www.example.com/atom.xml" />

 

 

For an RSS feed:

<link rel="alternate" type="application/rss+xml" title="Your Feed Title" href="http://www.example.com/rss.xml" />

To learn more about feed autodiscovery, there are a number of additional resources.

 

17. My Googlebot question isn't answered here. Where should I send it?

 

Please contact us with questions.

 

©2005 Google - Home - About Google - We're Hiring - Site Map

 

SORRY ABOUT THE LONG POST BUT IT WILL ANSWER MANY QUESTIONS I GOT THIS OFF OF GOOGLE SITE. THIS THING MAY BE A BIG PORBLEM

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...