Whats a google spider?

tee · Sep 28, 2004

I was checking who is online and there is a bunch of google spiders. What the hell is a google spider?

imdaman1 · Sep 28, 2004

"Bots" sent here by Google to obtain information about the site. I'm not exactly sure why they do this but it is not uncommon. Anybody know more about this?

AnaSCI · Sep 28, 2004

yea its for our search engine listings, thats all

tee · Sep 29, 2004

Okay. I thought we needed the Orkin man or something.

AnaSCI · Sep 29, 2004

lol hahah

wolfyEVH · Sep 29, 2004

do a search on google...

imdaman1 · Sep 29, 2004

tee said:
Okay. I thought we needed the Orkin man or something.

Why am I NOT surprised? :nerd:

NobleSavage · Oct 13, 2004

I'm a partner in a small company that makes a lot of cash from writing spiders. I don't personally write them; I'm more the business guy. But I can pretty much tell you everything you would want to know about them.

Basically they are just a computer program that runs on a server. The spiders surf the web like robots looking for information. They use the same http protocols that your web browser uses. Because they are just programs they can surf through millions of pages 24/7. You teach them to retrieve specific information like pricing data. Google uses them to collect bits of text that are indexed and cataloged so when you go there and type in a search term you quickly get all your results. It takes Google about 4 months to collect all the data that they do. Many of the sites where you can get price comparisons use spiders to get their data.

They are pretty damn cool. And they keep getting more and more complex. If you look through your log files you will see that you are often visited by spiders. You can usually tell because they go from page to page in a mechanical manner. Some identify them selves such as "Google Bot" some don’t.

You can put a file in your web root directory called robots.txt and spiders are supposed to follow the rules you set. For example if anasci didn't want their pages or certain forums cataloged and stored all over Google you could write rules that tell the bots that this shit is private. Most reputable companies, such as Google will follow the rules, but there is no law that says they must.

A list of common spiders and their IP numbers can be found here: http://www.searchengineworld.com/spiders/spider_ips.htm

Search

Whats a google spider?

tee

AnaSCI VET

imdaman1

AnaSCI VET

AnaSCI

ADMINISTRATOR

tee

AnaSCI VET

AnaSCI

ADMINISTRATOR

wolfyEVH

Guest

imdaman1

AnaSCI VET

NobleSavage

New member