Webmasters decide which web crawl can crawl the website web pages and this policy written in robots.txt, allow or deny each bot from user agent either separately or combined. Most of these bots also important regarding Search Engine Optimization (SEO) perspectives.
Google bots
If you need web traffic to your website then google is the largest search engine source, so any webmaster can not afford block Google bots, here is the list:
Googlebot
The primary crawling bot that Google search engine use to crawler website web pages is Googlebot
, make sure Googlebot is not not in Deny or recommended to be in Allow.
Googlebot User Agents
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mediapartners-Google
Google AdSense use Mediapartners-Google as user agent to crawl your website to approve or disapprove to monetize. If a webmaster applied for AdSense and website being checked, never deny Mediapartners-Google and keep under Allow in robots.txt file.
Mediapartners-Google User Agents
Mediapartners-Google
googleweblight
Another google's crawling user agent googleweblight
seen in the logs of websites. googleweblight generate page snapshots for Google, so Allow or do not Deny in robots.txt file.
Mediapartners-Google User Agents
Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko; googleweblight) Chrome/38.0.1025.166 Mobile Safari/535.19
Microsoft Bing bots
Now a days Bing is the second largest source of web traffic received as a search engine results. As per Bing Webmasters portal documentation, there are mainly 3 bots being used by Bing search engine to crawl websites and all have mobile and desktop variants.
Bingbot
Most used web crawler observed in the logs. As a matter of fact, Yahoo Search also being powered by Bing Search Engine, so this is also a very important user agent bot.
Bingbot User Agents
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/W.X.Y.Z Safari/537.36 Edg/W.X.Y.Z
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 Edg/W.X.Y.Z (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
BingPreview
BingPreview
is a generate page snapshots for Bing. Sometimes crawler visit website as a customer or usual visitor i.e. web page Ajax request and JavaScript also run and crawled by the crawler. BingPreview one of these crawlers. You may think of BingPriview working as googleweblight
for Bing.
BingPreview User Agents
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b
Mozilla/5.0 (Windows Phone 8.1; ARM; Trident/7.0; Touch; rv:11.0; IEMobile/11.0; NOKIA; Lumia 530) like Gecko BingPreview/1.0b
AdIdxBot
AdIdxBot may be not relevant to all webmasters but valuable. This crawler is used regarding Bing Ads.
AdIdxBot User Agents
Mozilla/5.0 (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)
Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)
Mozilla/5.0 (Windows Phone 8.1; ARM; Trident/7.0; Touch; rv:11.0; IEMobile/11.0; NOKIA; Lumia 530) like Gecko (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)
facebookexternalhit
Social Media and Social Networks are the big contributing sources of incoming traffic and Facebook share is one of the largest social media source for webmasters websites. Facebook crawl through facebookexternalhit
the webpage link used in status update or as share. So, never Deny and always Allow in robots.txt.
facebookexternalhit User Agents
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
YandexBot
Yandex
also a search engine and may give you traffic but little as compared to giant search engines.
YandexBot User Agents
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
Pinterestbot
Particularly for websites rich in images Pinterest is a great sharing social media platform and source for incoming web traffic. Always allow Pinterestbot
if your webpage link is attached with a pin on Pinterest.
Pinterestbot User Agents
Mozilla/5.0 (compatible; Pinterestbot/1.0; +http://www.pinterest.com/bot.html)
robots.txt
As per the discussion above, the robots.txt should be like this
User-agent: Googlebot
Allow: /
User-agent: Mediapartners-Google
Allow: /
User-agent: googleweblight
Allow: /
User-agent: Bingbot
Allow: /
User-agent: BingPreview
Allow: /
User-agent: AdIdxBot
Allow: /
User-agent: facebookexternalhit
Allow: /
User-agent: YandexBot
Allow: /
User-agent: Pinterestbot
Allow: /
Posted Status in IT