How to block BaiduSpider bot User-Agent

The Baidu spider (BaiduSpider user agent) can be a real pain to block, especially since it does not respect a robots.txt as it should. The following IIS URL Rewrite snippet blocks the Baidu spider based on its User-Agent string.

Normally you would block a bot or spider using the following robots.txt:

User-agent: BaiduSpider
Disallow: /

or

User-agent: *
Disallow: /

This doesn’t work for Baidu…

You can use the following IIS URL Rewrite rule to block the BaiduSpider User-Agent on your website. The only access allowed is to robots.txt, all other requests are blocked with a 403 Access Denied.

Expand the pattern= with multiple user agent strings, divided by a pipe (|), to block more bots. For example pattern="Baiduspider|Bing" or pattern="Googlebot|Bing".

Hint, search IIS URL Rewrite related posts on Saotn.org!

<!--
  Block BaiduSpider
-->
<rule name="block_BaiduSpider" stopProcessing="true">
  <match url="(.*)" />
  <conditions trackAllCaptures="true">
   <add input="{HTTP_USER_AGENT}" pattern="Baiduspider" negate="false" ignoreCase="true" />
   <add input="{URL}" pattern="^/robots\.txt" negate="true" ignoreCase="true" />
  </conditions>
  <action type="CustomResponse"
   statusCode="403"
   statusReason="Forbidden: Access is denied."
   statusDescription="Access is denied!" />
</rule>

Verifying the rewrite rule to block Baidu #

Using Fiddler‘s Composer option, to compose an HTTP request, you can easily verify the rewrite rule, as shown in the next two images.

Verifying BaiduSpider is blocked with Fiddler

Verifying BaiduSpider is blocked with Fiddler: request

Verifying BaiduSpider is blocked with Fiddler

Verifying BaiduSpider is blocked with Fiddler; response


Show your support


If you want to step in to help me cover the costs for running this website, that would be awesome. Just use this link to donate a cup of coffee ($5 USD for example). And please share the love and help others make use of this website. Thank you very much!

This may interest you:   PowerShell blacklist check script: find an IP address' blacklist status & reputation

About the Author Jan Reilink

My name is Jan. I am not a hacker, coder, developer, programmer or guru. I am merely a system administrator, doing my daily thing at Vevida in the Netherlands. With over 15 years of experience, my specialties include Windows Server, IIS, Linux (CentOS, Debian), security, PHP, websites & optimization.

follow me on:

Leave a Reply

1 Comment on "How to block BaiduSpider bot User-Agent"

avatar
  Subscribe  
newest oldest most voted
Notify of
Lokman AKKAYA
Guest