Web crawler is considered as internet bot that index web pages for searchers and give your website visits, but this is not always be the case. Some web crawlers crawl to steal your website secretes for their customers like outperforming keywords, topics and trends. No visitor received by your website but lot of load and resource consumption on website and server. Here are some of them I experienced badly:
AhrefsBot
AhrefsBot maintains trillions of links for customers to analyze competitors but this AhrefsBot crawler will never give you visitors like Bing, DuckDuckGo, Google or Yahoo do. Consider this as load or burden your website and the hosting server.
BLEXBot
WebMeUp backlink tool - the most accurate link checker
BLEXBot crawling bot of WebMeUp, the backlink tool maintains index of backlinks to scrutinize links. AS per WebMeUp (BLEXBot) own claim,
scrutinize links to your site, clean-up your backlink profile, perform a link audit or steal a competitor's backlink secrets
Here, steal a competitor's backlink secrets is alarming. This crawler definitely make customer who may be interested in competing and capturing your business.
SEMrushBot
SEMrushBot crawl data for their webmaster customers to assist analyze competitor websites like keywords etc. SEMrushBot is a useless crawler that does not give visitor but consume website resources to dig data for its customers.
How to Disallow Bad Crawling bots in robots.txt
Following is the sample robots.txt
file to disallow bad crawlers to deny their useless activities.
User-agent: Adsbot
Disallow:/
User-agent: AhrefsBot
Disallow:/
User-agent: AspiegelBot
Disallow:/
User-agent: DotBot
Disallow:/
User-agent: MauiBot
Disallow:/
User-agent: MJ12Bot
Disallow:/
User-agent: PetalBot
Disallow:/
User-agent: SEMrushBot
Disallow:/
User-agent: BLEXBot
Disallow:/
The crawlers above which mostly not fruitful for website owners and consume website resources may bypass robots.txt
then you may block their IPs with .htaccess in apache or web.config for .Net websites. Bots ignore robots.txt may be involved in suspecious activity including steal secretes for phishing.
Posted Status in IT