Address
304 North Cardinal St.
Dorchester Center, MA 02124
Work Hours
Monday to Friday: 7AM - 7PM
Weekend: 10AM - 5PM
The Baidu spider (BaiduSpider user agent) can be a real pain to block, especially since it does not respect a robots.txt as it should. This post shows you how to block Baidu Spider bot, using IIS URL Rewrite Module based on its User-Agent string.
The Baidu spider (BaiduSpider user agent) can be a real pain to block, especially since it does not respect a robots.txt as it should. This post shows you how to block BaiduSpider bot, using IIS URL Rewrite Module based on its User-Agent string.
Not all web crawlers or bots respect the robots.txt file. This post explains how you can block the BaiduSpider bot from accessing your website.
A bot is often also called a spider. Normally you would block a bot or spider using the following robots.txt
:
User-agent: BaiduSpider
Disallow: /
or
User-agent: *
Disallow: /
But this doesn’t work for blocking Baidu spider… 🙁
You can use the following IIS URL Rewrite Module rule to block the BaiduSpider User-Agent on your website in Windows Server IIS. The only access allowed is to robots.txt
, all other requests are blocked with a 403 Access Denied
.
Expand the pattern=
with multiple user agent strings, divided by a pipe (|), to block more bots. For example pattern="Baiduspider|Bing"
or pattern="Googlebot|Bing"
.
Hint, search IIS URL Rewrite related posts on Saotn.org!
<!--
Block Baidu spider
-->
<rule name="block_BaiduSpider" stopProcessing="true">
<match url="(.*)" />
<conditions trackAllCaptures="true">
<add input="{HTTP_USER_AGENT}" pattern="Baiduspider" negate="false" ignoreCase="true" />
<add input="{URL}" pattern="^/robots\.txt" negate="true" ignoreCase="true" />
</conditions>
<action type="CustomResponse"
statusCode="403"
statusReason="Forbidden: Access is denied."
statusDescription="Access is denied!" />
</rule>
Using Fiddler‘s Composer option, to compose an HTTP request, you can easily verify the rewrite rule, as shown in the next two images.
That’s it!