How to block BaiduSpider bot User-Agent?

You are here: Sysadmins of the North » Windows Server » How to block BaiduSpider bot User-Agent?

The Baidu spider (BaiduSpider user agent) can be a real pain to block, especially since it does not respect a robots.txt as it should. This post shows you how to block Baidu Spider bot, using IIS URL Rewrite Module based on its User-Agent string.

A bot is often also called a spider.

Normally you would block a bot or spider using the following robots.txt:

User-agent: BaiduSpider Disallow: /
Code language: HTTP (http)

or

User-agent: * Disallow: /
Code language: HTTP (http)

This doesn’t work for blocking Baidu… :(

You can use the following IIS URL Rewrite rule to block the BaiduSpider User-Agent on your website. The only access allowed is to robots.txt, all other requests are blocked with a 403 Access Denied.

Expand the pattern= with multiple user agent strings, divided by a pipe (|), to block more bots. For example pattern="Baiduspider|Bing" or pattern="Googlebot|Bing".

Hint, search IIS URL Rewrite related posts on Saotn.org!

<!-- Block Baidu and BaiduSpider bot --> <rule name="block_BaiduSpider" stopProcessing="true"> <match url="(.*)" /> <conditions trackAllCaptures="true"> <add input="{HTTP_USER_AGENT}" pattern="Baiduspider" negate="false" ignoreCase="true" /> <add input="{URL}" pattern="^/robots\.txt" negate="true" ignoreCase="true" /> </conditions> <action type="CustomResponse" statusCode="403" statusReason="Forbidden: Access is denied." statusDescription="Access is denied!" /> </rule>
Code language: HTML, XML (xml)

Verifying the rewrite rule to block Baidu

Using Fiddler‘s Composer option, to compose an HTTP request, you can easily verify the rewrite rule, as shown in the next two images.

Verifying BaiduSpider is blocked with Fiddler
Verifying BaiduSpider is blocked with Fiddler: request
Verifying BaiduSpider is blocked with Fiddler
Verifying BaiduSpider is blocked with Fiddler; response

That’s it!

3 thoughts on “How to block BaiduSpider bot User-Agent?”

  1. Pingback: Bots en zoekmachines blokkeren in .htaccess - ITFAQ-nl

  2. Pingback: Bots en zoekmachines blokkeren in .htaccess - Digitale Start, jouw digitale startpunt

Hi! Join the discussion, leave a reply!

Scroll to Top