The Baidu spider (BaiduSpider user agent) can be a real pain to block, especially since it does not respect a robots.txt as it should. This post shows you how to block Baidu Spider bot, using IIS URL Rewrite Module based on its User-Agent string.
A bot is often also called a spider.
Normally you would block a bot or spider using the following robots.txt
:
User-agent: BaiduSpider
Disallow: /
Code language: HTTP (http)
or
User-agent: *
Disallow: /
Code language: HTTP (http)
This doesn’t work for blocking Baidu… :(
You can use the following IIS URL Rewrite rule to block the BaiduSpider User-Agent on your website. The only access allowed is to robots.txt
, all other requests are blocked with a 403 Access Denied
.
Expand the pattern=
with multiple user agent strings, divided by a pipe (|), to block more bots. For example pattern="Baiduspider|Bing"
or pattern="Googlebot|Bing"
.
Hint, search IIS URL Rewrite related posts on Saotn.org!
<!--
Block Baidu and BaiduSpider bot
-->
<rule name="block_BaiduSpider" stopProcessing="true">
<match url="(.*)" />
<conditions trackAllCaptures="true">
<add input="{HTTP_USER_AGENT}" pattern="Baiduspider" negate="false" ignoreCase="true" />
<add input="{URL}" pattern="^/robots\.txt" negate="true" ignoreCase="true" />
</conditions>
<action type="CustomResponse"
statusCode="403"
statusReason="Forbidden: Access is denied."
statusDescription="Access is denied!" />
</rule>
Code language: HTML, XML (xml)
Verifying the rewrite rule to block Baidu
Using Fiddler‘s Composer option, to compose an HTTP request, you can easily verify the rewrite rule, as shown in the next two images.


That’s it!
See alternative example ; http://lokmanakkaya.com.tr/2013/07/18/baiduspider-bot-nasil-engellenir-baiduspider-engelleme-baiduspider-bloke/