Sysadmins of the North

Technical blog, where topics include: computer, server, web, sysadmin, MySQL, database, virtualization, optimization and security

How to block BaiduSpider bot User-Agent

The Baidu spider (BaiduSpider user agent) can be a real pain to block, especially since it does not respect a robots.txt as it should. The following IIS URL Rewrite snippet blocks the Baidu spider based on its User-Agent string.

Normally you would block a bot or spider using the following robots.txt:

User-agent: BaiduSpider
Disallow: /

or

User-agent: *
Disallow: /

This doesn’t work for Baidu…

You can use the following IIS URL Rewrite rule to block the BaiduSpider User-Agent on your website. The only access allowed is to robots.txt, all other requests are blocked with a 403 Access Denied.

Expand the pattern= with multiple user agent strings, divided by a pipe (|), to block more bots. For example pattern="Baiduspider|Bing" or pattern="Googlebot|Bing".

Hint, search IIS URL Rewrite related posts on Saotn.org!

<!--
  Block BaiduSpider
-->
<rule name="block_BaiduSpider" stopProcessing="true">
  <match url="(.*)" />
  <conditions trackAllCaptures="true">
   <add input="{HTTP_USER_AGENT}" pattern="Baiduspider" negate="false" ignoreCase="true" />
   <add input="{URL}" pattern="^/robots\.txt" negate="true" ignoreCase="true" />
  </conditions>
  <action type="CustomResponse"
   statusCode="403"
   statusReason="Forbidden: Access is denied."
   statusDescription="Access is denied!" />
</rule>

Verifying the rewrite rule to block Baidu

Using Fiddler‘s Composer option, to compose an HTTP request, you can easily verify the rewrite rule, as shown in the next two images.

Verifying BaiduSpider is blocked with Fiddler

Verifying BaiduSpider is blocked with Fiddler: request

Verifying BaiduSpider is blocked with Fiddler

Verifying BaiduSpider is blocked with Fiddler; response

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

shares