The Baidu spider (BaiduSpider user agent) can be a real pain to block, especially since it does not respect a robots.txt as it should. The following IIS URL Rewrite snippet blocks the Baidu spider based on its User-Agent string.

Normally you would block a bot or spider using the following robots.txt:

User-agent: BaiduSpider
Disallow: /

or

User-agent: *
Disallow: /

This doesn’t work for Baidu…

You can use the following IIS URL Rewrite rule to block the BaiduSpider User-Agent on your website. The only access allowed is to robots.txt, all other requests are blocked with a 403 Access Denied.

Expand the pattern= with multiple user agent strings, divided by a pipe (|), to block more bots. For example pattern="Baiduspider|Bing" or pattern="Googlebot|Bing".

Hint, search IIS URL Rewrite related posts on Saotn.org!

<!--
  Block BaiduSpider
-->
<rule name="block_BaiduSpider" stopProcessing="true">
  <match url="(.*)" />
  <conditions trackAllCaptures="true">
   <add input="{HTTP_USER_AGENT}" pattern="Baiduspider" negate="false" ignoreCase="true" />
   <add input="{URL}" pattern="^/robots\.txt" negate="true" ignoreCase="true" />
  </conditions>
  <action type="CustomResponse"
   statusCode="403"
   statusReason="Forbidden: Access is denied."
   statusDescription="Access is denied!" />
</rule>

Verifying the rewrite rule to block Baidu

Using Fiddler‘s Composer option, to compose an HTTP request, you can easily verify the rewrite rule, as shown in the next two images.

Verifying BaiduSpider is blocked with Fiddler

Verifying BaiduSpider is blocked with Fiddler: request

Verifying BaiduSpider is blocked with Fiddler

Verifying BaiduSpider is blocked with Fiddler; response


Jan Reilink

My name is Jan. I am not a hacker, coder, developer, programmer or guru. I am merely a system administrator, doing my daily thing at Vevida in the Netherlands. With over 15 years of experience, my specialties include Windows Server, IIS, Linux (CentOS, Debian), security, PHP, WordPress, websites & optimization. Want to support me and donate? Use this link: https://paypal.me/jreilink.

2 Comments

Referer spam (of referrer spam): wat is dat? Hoe stop je het? - ITFAQ.nl · 30 December 2018 at 10:53

[…] je kunt bepaalde User-Agents blokkeren, of IP-adressen blokkeren ([2], [3]), maar het probleem hiermee is dat de lijsten om te blokkeren […]

Leave a Reply

Your email address will not be published. Required fields are marked *

22 queries, 0.128 seconds running PHP version 7.3.2