Skip to main content

Blocking Bot Requests

Websites are frequently made inaccessible due to an overwhelming surge in traffic, which is often caused by aggressive bot activity. While traditional Distributed Denial of Service (DDoS) attacks require specialized protection, such as Cloudflare, bot traffic can also be a significant issue. Fortunately, we can mitigate this issue by using an .htaccess file to block requests. This is only possible when using the Apache web server; Nginx does not support .htaccess files.

Checking the log files

If your site is not available, take a closer look at your web server's logs first. You can find them in the /logs subdirectory of your home directory (use ls $HOME/logs). In the access log file, look at the last column to find the User-Agent header.

The following example illustrates a common problematic script FacebookExternalHit, a crawler that often causes issues. In the following example, the bot makes multiple requests per second:

173.252.83.40 - - [15/Aug/2024:01:19:00 +0200] "GET /*** HTTP/2.0" 200 434490 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
69.171.249.116 - - [15/Aug/2024:01:19:00 +0200] "GET /***/***/*** HTTP/2.0" 200 248774 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
69.171.249.15 - - [15/Aug/2024:01:19:01 +0200] "GET /***/*** HTTP/2.0" 200 94504 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
173.252.87.11 - - [15/Aug/2024:01:19:02 +0200] "GET /***/*** HTTP/2.0" 200 262990 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
66.220.149.115 - - [15/Aug/2024:01:19:02 +0200] "GET /***/***/*** HTTP/2.0" 200 258529 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"
69.171.249.112 - - [15/Aug/2024:01:19:03 +0200] "GET /***/***/*** HTTP/2.0" 200 356111 "-" "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)"

This could lead to problems with the availability of the website over a longer period of time.

Creating the Rewrite Rule

To prevent the FacebookExternalHit crawler from accessing your site, you can create an .htaccess file in your document root with the following configuration:

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} facebookexternalhit [NC]
RewriteRule .* - [F,L]

If an .htaccess file already contains other rewrite rules, you can simply append the above rule to it. To ensure the new rule takes precedence, place it at the very top of the .htaccess file.

To block multiple bots within the same rewrite condition, you can use the following syntax:

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} (Amazonbot|Bytespider|facebookexternalhit) [NC]
RewriteRule .* - [F,L]

By implementing this simple .htaccess rule, you can effectively block malicious bot traffic and protect your website from downtime. Remember to regularly review your access logs to ensure the rule is working as intended. With a little effort, you can safeguard your online presence and keep your website available to users around the world.