I'm a partner in a small company that makes a lot of cash from writing spiders. I don't personally write them; I'm more the business guy. But I can pretty much tell you everything you would want to know about them.
Basically they are just a computer program that runs on a server. The spiders surf the web like robots looking for information. They use the same http protocols that your web browser uses. Because they are just programs they can surf through millions of pages 24/7. You teach them to retrieve specific information like pricing data. Google uses them to collect bits of text that are indexed and cataloged so when you go there and type in a search term you quickly get all your results. It takes Google about 4 months to collect all the data that they do. Many of the sites where you can get price comparisons use spiders to get their data.
They are pretty damn cool. And they keep getting more and more complex. If you look through your log files you will see that you are often visited by spiders. You can usually tell because they go from page to page in a mechanical manner. Some identify them selves such as "Google Bot" some don’t.
You can put a file in your web root directory called robots.txt and spiders are supposed to follow the rules you set. For example if anasci didn't want their pages or certain forums cataloged and stored all over Google you could write rules that tell the bots that this shit is private. Most reputable companies, such as Google will follow the rules, but there is no law that says they must.
A list of common spiders and their IP numbers can be found here:
http://www.searchengineworld.com/spiders/spider_ips.htm