A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (as Wikipedia says)
Googlebot, YahooBot, and MSNbot and similar spiders, bots, and crawlers are the programs that harvest information for search engines.
For anyone tracking statistics on their website, Googlebot, MSNbot, and YahooBot can be welcomed guests. These three search engine bots gather (harvest) information about your page for their respective search engine. Seeing these spiders more often is also desirable because this means that you are being indexed more often and more likely to show up quickly in the search engine results page.
While having Googlebot index your site more quickly is almost always a good thing, there are times when you don't want certain pages or images indexed. Most "reputable" spiders will obey a directive given by the robots.txt file. This file is document that tells spiders what they may and may not index. You can also explicitly instruct a robot not to follow any of the links on a page by the following meta tag:META NAME="Googlebot" CONTENT="nofollow".
These are GOOD BOTS.
BAD BOTS, generally, don’t play by the rules. They have a definitive malicious pattern and are mostly unregulated. Imagine thousands of page visits originating from a single IP address within a very short span of time. This activity stresses your Web servers, and chokes the available bandwidth. This directly impacts those genuine users on your website, trying to access a product or service.
Bad bots are programmed to perform a variety of malicious jobs.
They can be sent to steal content from your website. The content can be unique to your website or business. Examples of content can be product reviews, fresh breaking news, dynamic pricing information of products listed, product catalog, user generated content on community forms, and so on.
Bots can scrape the content (not only email addresses), and publish them elsewhere. This can affect your website’s search engine rankings. There have been instances of stolen content outranking the originals on Google search pages. This directly impacts the bottom-line of the websites that have invested millions of dollars to create original content.
Stealing content is not the only malicious activity these bots are capable of. They can spam community forums with intrusive ads or messages. They can create millions of fake leads on real-estate and classifieds portals. They can create phantom carts/cart abandonment on eCommerce portals. They can malign marketing analytics by skewing real web traffic to the website, creating inefficient ROI on marketing programs.