10 User Agent bots a webmaster should never Deny in robots.txt

Webmasters decide which web crawl can crawl the website web pages and this policy written in robots.txt, allow or deny each bot from user agent either separately or combined. Most of these bots also important regarding Search Engine Optimization (SEO) perspectives.

Google bots

If you need web traffic to your website then google is the largest search engine source, so any webmaster can not afford block Google bots, here is the list:

Googlebot

The primary crawling bot that Google search engine use to crawler website web pages is Googlebot, make sure Googlebot is not not in Deny or recommended to be in Allow.

Googlebot User Agents

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Mediapartners-Google

Google AdSense use Mediapartners-Google as user agent to crawl your website to approve or disapprove to monetize. If a webmaster applied for AdSense and website being checked, never deny Mediapartners-Google and keep under Allow in robots.txt file.

Mediapartners-Google User Agents

Mediapartners-Google

googleweblight

Another google's crawling user agent googleweblight seen in the logs of websites. googleweblight generate page snapshots for Google, so Allow or do not Deny in robots.txt file.

Mediapartners-Google User Agents

Mozilla/5.0 (Linux; Android 4.2.1; en-us; Nexus 5 Build/JOP40D) AppleWebKit/535.19 (KHTML, like Gecko; googleweblight) Chrome/38.0.1025.166 Mobile Safari/535.19

Microsoft Bing bots

Now a days Bing is the second largest source of web traffic received as a search engine results. As per Bing Webmasters portal documentation, there are mainly 3 bots being used by Bing search engine to crawl websites and all have mobile and desktop variants.

Bingbot

Most used web crawler observed in the logs. As a matter of fact, Yahoo Search also being powered by Bing Search Engine, so this is also a very important user agent bot.

Bingbot User Agents

Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/W.X.Y.Z Safari/537.36 Edg/W.X.Y.Z
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 Edg/W.X.Y.Z (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

BingPreview

BingPreview is a generate page snapshots for Bing. Sometimes crawler visit website as a customer or usual visitor i.e. web page Ajax request and JavaScript also run and crawled by the crawler. BingPreview one of these crawlers. You may think of BingPriview working as googleweblight for Bing.

BingPreview User Agents

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b
Mozilla/5.0 (Windows Phone 8.1; ARM; Trident/7.0; Touch; rv:11.0; IEMobile/11.0; NOKIA; Lumia 530) like Gecko BingPreview/1.0b

AdIdxBot

AdIdxBot may be not relevant to all webmasters but valuable. This crawler is used regarding Bing Ads.

AdIdxBot User Agents

Mozilla/5.0 (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)
Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)
Mozilla/5.0 (Windows Phone 8.1; ARM; Trident/7.0; Touch; rv:11.0; IEMobile/11.0; NOKIA; Lumia 530) like Gecko (compatible; adidxbot/2.0; +http://www.bing.com/bingbot.htm)

facebookexternalhit

Social Media and Social Networks are the big contributing sources of incoming traffic and Facebook share is one of the largest social media source for webmasters websites. Facebook crawl through facebookexternalhit the webpage link used in status update or as share. So, never Deny and always Allow in robots.txt.

facebookexternalhit User Agents

facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)

YandexBot

Yandex also a search engine and may give you traffic but little as compared to giant search engines.

YandexBot User Agents

Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)

Pinterestbot

Particularly for websites rich in images Pinterest is a great sharing social media platform and source for incoming web traffic. Always allow Pinterestbot if your webpage link is attached with a pin on Pinterest.

Pinterestbot User Agents

Mozilla/5.0 (compatible; Pinterestbot/1.0; +http://www.pinterest.com/bot.html)

robots.txt

As per the discussion above, the robots.txt should be like this

User-agent: Googlebot
Allow: /

User-agent: Mediapartners-Google
Allow: /

User-agent: googleweblight
Allow: /

User-agent: Bingbot
Allow: /

User-agent: BingPreview
Allow: /

User-agent: AdIdxBot
Allow: /

User-agent: facebookexternalhit
Allow: /

User-agent: YandexBot
Allow: /

User-agent: Pinterestbot
Allow: /

Posted Status in IT
Login InOR Register