Bad Crawler Bot: List of Useless Crawlers are burden on website server

Web crawler is considered as internet bot that index web pages for searchers and give your website visits, but this is not always be the case. Some web crawlers crawl to steal your website secretes for their customers like outperforming keywords, topics and trends. No visitor received by your website but lot of load and resource consumption on website and server. Here are some of them I experienced badly:

AhrefsBot

AhrefsBot maintains trillions of links for customers to analyze competitors but this AhrefsBot crawler will never give you visitors like Bing, DuckDuckGo, Google or Yahoo do. Consider this as load or burden your website and the hosting server.

BLEXBot

WebMeUp backlink tool - the most accurate link checker

BLEXBot crawling bot of WebMeUp, the backlink tool maintains index of backlinks to scrutinize links. AS per WebMeUp (BLEXBot) own claim,

scrutinize links to your site, clean-up your backlink profile, perform a link audit or steal a competitor's backlink secrets

Here, steal a competitor's backlink secrets is alarming. This crawler definitely make customer who may be interested in competing and capturing your business.

SEMrushBot

SEMrushBot crawl data for their webmaster customers to assist analyze competitor websites like keywords etc. SEMrushBot is a useless crawler that does not give visitor but consume website resources to dig data for its customers.

How to Disallow Bad Crawling bots in robots.txt

Following is the sample robots.txt file to disallow bad crawlers to deny their useless activities.

User-agent: Adsbot
Disallow:/

User-agent: AhrefsBot
Disallow:/

User-agent: AspiegelBot
Disallow:/

User-agent: DotBot
Disallow:/

User-agent: MauiBot
Disallow:/

User-agent: MJ12Bot
Disallow:/

User-agent: PetalBot
Disallow:/

User-agent: SEMrushBot
Disallow:/

User-agent: BLEXBot
Disallow:/

The crawlers above which mostly not fruitful for website owners and consume website resources may bypass robots.txt then you may block their IPs with .htaccess in apache or web.config for .Net websites. Bots ignore robots.txt may be involved in suspecious activity including steal secretes for phishing.

Posted Status in IT
Login InOR Register