Blocking bad bots and search engines using .htaccess

Protecting your website’s resources and bandwidth is a crucial task for any sysadmin. While most search engines are beneficial, aggressive crawlers and “bad bots” can quickly drain your server’s performance and skew your analytics.

Table of Contents show

This guide provides copy-paste ready configurations to prevent bandwidth theft, stop scrapers, and ensure that only relevant search engines crawl your website.

Using the .htaccess file on Apache-based servers (like those used for WordPress) is one of the most effective ways to stop these visitors before they even reach your CMS.

Why block bots?

Bandwidth Savings: Prevent scrapers from downloading your entire site.
Server Performance: Reduce CPU and RAM usage by denying access to non-essential crawlers.
SEO Management: Prevent “junk” search engines from indexing development areas or duplicate content.
Security: Many bots are actually vulnerability scanners looking for exploits.

Note for IIS Users: If you are running on Windows Server with IIS, you should use the web.config file instead. Check out our guide on blocking bad bots in IIS using web.config.

Using mod_rewrite to block User-Agents

The most effective method is filtering by the User-Agent string. Add the following code to your .htaccess file. Ensure RewriteEngine On is present at the top of the block.

RewriteEngine On
# Block specific aggressive bots
RewriteCond %{HTTP_USER_AGENT} ^.*(Baiduspider|YandexBot|AhrefsBot|MJ12bot).*$ [NC]
RewriteRule .* - [F,L]

Looking to block BaiduSpider bot User-Agent?

Breakdown of the flags:

[NC]: Stands for No Case, making the check case-insensitive.
[F]: Returns a 403 Forbidden response. This is more efficient than a redirect because it cuts the connection immediately.
[L]: This is the Last rule, meaning Apache stops processing further rules if this one matches.

Blocking entire categories of bots

Sometimes you want to block all bots from a specific region or known scraper networks. You can expand the list by pipe-separating (|) the names:

RewriteCond %{HTTP_USER_AGENT} ^.*(Sogou|Exabot|DotBot|SemrushBot|Rogerbot).*$ [NC]
RewriteRule .* - [F,L]

Caution: Be careful not to block Googlebot or Bingbot unless you specifically want to remove your site from all major search results.

Blocking bots from specific directories

If you only want to protect sensitive areas like /wp-admin/ or /backups/, you can use a request URI condition:

RewriteCond %{REQUEST_URI} ^/(wp-admin|backups)/ [NC]
RewriteCond %{HTTP_USER_AGENT} ^.*(Bot|Spider|Crawler).*$ [NC]
RewriteRule .* - [F,L]

Running on Windows Server? Check out how to block bad bots in web.config for IIS instead. While the native URL Rewrite Module is the recommended way to handle redirects and blocking on modern IIS servers, some legacy setups might still use Helicon Ape. This third-party module allows IIS to process .htaccess files directly by emulating Apache modules. However, for new installations, it is best to migrate these rules to native web.config format for better performance and long-term support.

Extras

There are many posts outlining .htaccess and / or web.config usage in either Apache or IIS webservers. If you are moving from Apache to IIS? Follow convert .htaccess to web.config to convert your rules. Still want to use .htaccess on Windows? Here are the options and why Helicon Ape is EOL.

Besides bots and search engines, you can also block IP addresses in .htaccess files. And for Apache 2.4.6+ there is a new module you must use: mod_authz_core. See the Apache .htaccess security best practices.

Conclusion

Blocking bots via .htaccess is a proactive way to keep your server lean and secure. Regularly check your access logs to identify new aggressive user-agents that should be added to your blocklist. If you are noticing a high volume of hits from a single IP, you might also consider blocking IP addresses at the firewall level.

Please note this is an older post of mine, transferred from itfaq.nl, translated to English and actualized.

Summary

Blocking bots is essential for protecting your website’s resources, improving server performance, and managing SEO effectively.
Use the .htaccess file on Apache servers to prevent non-essential crawlers and reduce bandwidth theft.
Filter user-agents with specific rules in .htaccess, using flags like [F] for a 403 error and [L] to stop further processing.
You can also block entire categories of bots, but avoid blocking essential bots like Googlebot or Bingbot.
Regularly check access logs to identify aggressive user-agents and consider blocking IPs at the firewall level.

Is this post worth a small donation to you? Did it help you solve a problem? Or just want to say thanks?

If you found this post interesting, or it helped you solve a problem , why not buy me a coffee?

A small donation of only $5 helps out a lot in the development, research and hosting of this blog.

If I’ve helped you out and you want to thank me, why not buy me a coffee?

Thank you for your support.

Rate this post!

Why block bots?

Using mod_rewrite to block User-Agents

Breakdown of the flags:

Blocking entire categories of bots

Blocking bots from specific directories

Extras

Conclusion

Summary

Is this post worth a small donation to you? Did it help you solve a problem? Or just want to say thanks?

1 thought on “Blocking bad bots and search engines using .htaccess”

Leave a Comment Cancel reply

Blocking bad bots and search engines using .htaccess

Why block bots?

Using mod_rewrite to block User-Agents

Breakdown of the flags:

Blocking entire categories of bots

Blocking bots from specific directories

Extras

Conclusion

Summary

Is this post worth a small donation to you? Did it help you solve a problem? Or just want to say thanks?

1 thought on “Blocking bad bots and search engines using .htaccess”

Leave a Comment Cancel reply

Reach out to us for sponsorship opportunities