The Baidu spider (BaiduSpider user agent) can be a real pain to block, especially since it does not respect a robots.txt as it should. This post shows you how to block Baidu Spider bot, using IIS URL Rewrite Module based on its User-Agent string.

A bot is often also called a spider.

Normally you would block a bot or spider using the following robots.txt:

User-agent: BaiduSpiderDisallow: /

or

User-agent: *Disallow: /

This doesn't work for blocking Baidu... :(

You can use the following IIS URL Rewrite rule to block the BaiduSpider User-Agent on your website. The only access allowed is to robots.txt, all other requests are blocked with a 403 Access Denied.

Expand the pattern= with multiple user agent strings, divided by a pipe (|), to block more bots. For example pattern="Baiduspider|Bing" or pattern="Googlebot|Bing".

Hint, search IIS URL Rewrite related posts on Saotn.org!

<!--
Block Baidu and BaiduSpider bot
-->
<rule name="block_BaiduSpider" stopProcessing="true">
<match url="(.*)" />
<conditions trackAllCaptures="true">
<add input="{HTTP_USER_AGENT}" pattern="Baiduspider" negate="false" ignoreCase="true" />
<add input="{URL}" pattern="^/robots\.txt" negate="true" ignoreCase="true" />
</conditions>
<action type="CustomResponse"
statusCode="403"
statusReason="Forbidden: Access is denied."
statusDescription="Access is denied!" />
</rule>

Verifying the rewrite rule to block Baidu

Using Fiddler's Composer option, to compose an HTTP request, you can easily verify the rewrite rule, as shown in the next two images.

Verifying BaiduSpider is blocked with Fiddler
Verifying BaiduSpider is blocked with Fiddler: request
Verifying BaiduSpider is blocked with Fiddler
Verifying BaiduSpider is blocked with Fiddler; response

That's it!

Donate a cup of coffee
Donate a cup of coffee

Thank you very much! <3 ❤️

1 Comment

Comments are closed