Bots and botnets are commonly associated with cybercriminals stealing data, identities, credit card numbers and worse. But bots can also serve good purposes. Separating good bots from bad can also make a big difference in how you protect your company’s website and ensure that that your site gets the Internet traffic it deserves.
Most good bots are essentially crawlers sent out from the world’s biggest web sites to index content for their search engines and social media platforms. You WANT those bots to visit you. They bring you more business! Shutting them down as part of strategy to block bad bots is a losing strategy.
Here, in reverse order of how likely they are to visit any web site, are the 10 most important good bots that you should know about now. Make sure your security strategy welcomes these bots (or at least know why you chose to block them)
|Bot Name||% of Sites Crawled||Bot Type|
|Baidu Spider||89%||Search Bot|
|MSN Bot/BingBot||82%||Search Bot|
|Yandex Bot||73%||Search Bot|
|Soso Spider||61%||Search Bot|
|Sogou Spider||31%||Search Bot|
|Google Plus Share||24%||Crawler|
|Facebook External Hit||24%||Crawler|
|Google Feedfetcher||22%||Feed Fetcher|
Learn more about the Top 10 Good Bots
1. Googlebot – Googlebot is Google’s web crawling bot (sometimes also called a “spider”). Googlebot uses an algorithmic process: computer programs determine which sites to crawl, how often, and how many pages to fetch from each site. Googlebot’s crawl process begins with a list of webpage URLs, generated from previous crawl processes and augmented with Sitemap data provided by webmasters. As Googlebot visits each of these websites it detects links (SRC and HREF) on each page and adds them to its list of pages to crawl. New sites, changes to existing sites, and dead links are noted and used to update the Google index.
2. Baiduspider – Baiduspider is a robot of Baidu Chinese search engine. Baidu (Chinese: 百度; pinyin: Bǎidù) is the leading Chinese search engine for websites, audio files, and images.
3. MSN Bot/Bingbot – Retired October 2010 and rebranded as Bingbot, this is a web-crawling robot (type of Internet bot), deployed by Microsoft to supply Bing (search engine). It collects documents from the web to build a searchable index for the Bing (search engine).
4. Yandex Bot – Yandex bot is Yandex’s search engine’s crawler. Yandex is a Russian Internet company which operates the largest search engine in Russia with about 60% market share in that country. Yandex ranked as the fifth largest search engine worldwide with more than 150 million searches per day as of April 2012 and more than 25.5 million visitors.
5. Soso Spider – Soso.com is a Chinese search engine owned by Tencent Holdings Limited, which is well known for its other creation QQ. As of 13 May 2012, Soso.com is ranked as the 36th most visited website in the world and the 13th most visited website in China, according to Alexa Internet. On an average, Soso.com gets 21,064,490 page views everyday.
6. Exabot – Exabot is the crawler for ExaLead out of France. Founded in 2000 by search engine pioneers, Dassault Systèmes, ExaLead provides search and unified information access software.
7. Sogou Spider – Sogou.com is a Chinese search engine. It was launched August 4, 2004. As of April 2010, it has a rank of 121 in Alexa’s Internet rankings. Sogou provides an index of up to 10 billion web pages.
8. Google Plus Share – Google Plus lets you share recommendations with friends, contacts and the rest of the web – on Google search. The +1 button helps initialize Google’s instant share capabilities, and it also provides a way to give something your public stamp of approval.
9. Facebook External Hit – Facebook allows its users to send links to interesting web content to other Facebook users. Part of how this works on the Facebook system involves the temporary display of certain images or details related to the web content, such as the title of the webpage or the embed tag of a video. The Facebook system retrieves this information only after a user provides a link.
10. Google Feedfetcher – Used by Google to grab RSS or Atom feeds when users choose to add them to their Google homepage or Google Reader. Feedfetcher collects and periodically refreshes these user-initiated feeds, but does not index them in Blog Search or Google’s other search services (feeds appear in the search results only if they’ve been crawled by Googlebot).