Notifications
Clear all

mod_rewrite: Examples


 wild
(@wild)
Member Admin
Joined: 2 months ago
Posts: 293
Topic starter  

1. Creating a Permanent Redirect (PURL)

A common problem when you upgrade a site is that it's not possible to fix all the inbound links and people's personal bookmarks. The solution is to create a 301 Redirect that takes requests for the missing pages and redirects them to relevant content.

A 301 Redirect sends a message to search engines that the page requested has 'Moved Permanently'. If you just want to redirect then [R] will suffice - this actually sends a 302 Found response which indicates that the resource has been 'Found' and the search engine doesn't need to update it's index. See our article on HTTP Server Status Codes for a list of all the most common values.

Example 1: redirect all requests for pages in the "/media/" directory to a new page, media.html

RewriteRule ^media/ /media.html [R=301,L]

Note: The L flag indicates that the redirect should happen immediately and any subsequent rules are to be ignored. After the redirect occurs all rules will be evaluated again for the new request.

Translation:

  • IF the request is for a page in the /media/ directory;
  • THEN redirect the request to the new page: /media.html

The caret (^) is an important character in regular expressions as it represents the start of the string. Without the caret, the above example would match any directory who's name ended with 'media' (eg. /oldmedia/ or /archive/media/). With the caret in place, only the 'media' directory at the root level of the site will match.

You might sometimes see the regular expression written as ^media/.*$, but the .*$ serves no purpose and can always be omitted. The same applies to ^.*.

Example 2: Redirect requests for a specific page to a different address:

RewriteRule ^oldaddress\.html /newaddress.html [R=301,L]

A period . by itself is a wildcard that will match any single character. To match an actual period, it needs to be escaped with a backslash: \.. The target (the right-hand-side of the RewriteRule) however is not a regular expression, so you don't need to escape anything there.

2. Disabling or enabling access for some users

Suppose you spot some traffic on your site that you don't want - someone stealing content or images or a competitor constantly monitoring your prices.

This code in your .htaccess file will block them from accessing your site at all. Everyone else can continue to access the site normally:

RewriteCond %{REMOTE_ADDR} ^12\.34\.56\.78$ [OR]
RewriteCond %{REMOTE_ADDR} ^87\.65\.43\.21$
RewriteCond %{REQUEST_URI} !^sorry\.html
RewriteRule .* /sorry.html

Translation:

  • IF their IP address is 12.34.56.78
  • OR their IP address is 87.65.43.21
  • AND the request is not for sorry.html
  • THEN display the sorry.html page

Note: if you have more than one address to deny then all but the last need to be followed by [OR] for the logic to work. Otherwise you'll only block people with multiple IP addresses (which is impossible).

The final RewriteCond is essential to prevent a recursive loop. This is because (at least in these examples) the target page, sorry.html, is within the scope of the redirect (because .* matches any request).

Conversely, you may want to allow only certain IP addresses to access the site:

RewriteCond %{REMOTE_ADDR} !^12\.34\.56\.78$
RewriteCond %{REMOTE_ADDR} !^87\.65\.43\.21$
RewriteCond %{REQUEST_URI} !^sorry\.html
RewriteRule .* /sorry.html

Translation:

  • IF their IP address is not 12.34.56.78
  • AND their IP address is not 87.65.43.21
  • AND the request is not for sorry.html
  • THEN display the sorry.html page

In both cases, users who are blocked will see the contents of /sorry.html, but under the address of the page they tried to view. To redirect them to that page - so that the URL in the address bar changes - you need to add [R] or [R=301] to the RewriteRule. Those codes will generate a 302 Found or 301 Moved Permanently response.

If you only wanted to wall off a directory rather than the entire website, then a couple of changes are necessary:

RewriteCond %{REMOTE_ADDR} ^12\.34\.56\.78$ [OR]
RewriteCond %{REMOTE_ADDR} ^87\.65\.43\.21$
RewriteRule ^private/ /sorry.html

Translation:

  • IF their IP address is 12.34.56.78
  • OR their IP address is 87.65.43.21
  • THEN any request for the /private/ directory will display sorry.html

This will prevent the listed IP addresses from accessing the contents of the private/ directory. The last RewriteCond is not necessary in this case as the 'sorry' file is outside the scope of the redirect pattern.

If you don't want to go to the bother of setting up an alternative page then a 403 Forbidden will suffice:

RewriteRule ^private/ - [F]

Another useful response is 410 Gone which indicates that the requested content no longer exists on the server. This sends a stronger message than a 301 Redirect and will encourage people to fix broken links to you from their sites.

RewriteRule ^media/ - [G]

It's a good practice to first issue a 301 Redirect for a while before switching to 410 Gone.

3. How to prevent Hotlinking

'Hotlinking' is a commonly used term for when another website directly embeds your images in a web page or forum. The simplest way to stop this practice is to send a 403 ('Forbidden') response when the referer is not your own site or one that you want to be able to display your images:

RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^ http://www\.example\.net [NC]
RewriteRule \.(jpe?g|gif)$ - [F]

Translation:

  • IF the browser passes a referer (at least one character);
  • AND the referer is not your domain (add a line, or a regular expression for each possibility);
  • THEN block requests for files ending in .jpg, .jpeg or .gif

One limitation of this is that it can be bypassed by any browser passing a blank referer. To get around that you can remove the first RewriteCond, but then you'll be blocking the images for people who ARE browsing your site, but have their browser set to not pass a referer string.

The next change you can make is to let your images be seen when your pages are viewed from a 'webmail client', 'search engine cache' or 'web-based translator'. For example, to display images for people using Google's [Cached] feature, add the following condition:

RewriteCond %{HTTP_REFERER} !q=cache

Translation:

  • AND the referer doesn't contain the string 'q=cache';

Similar rules can be added for Yahoo!, MSN, Ask Jeeves and so on. The best way to work out what rules you need is to analyse your log files to see who's being blocked by the anti-hotlinking code (search for 403's). These rules all need to be placed before the RewriteRule.

4. Blocking Spiders, Spambots and Other User Agents

There are a lot of weird and wonderful user agents roaming the web. Some are working for you, and others against you, stealing bandwidth or content. To block a 'legitimate' web robot you can simply add it's name to your robots.txt file. For the rest you have mod_rewrite.

Here's a short example to get you started:

RewriteCond %{HTTP_USER_AGENT} "Email\ Extractor" [OR]
RewriteCond %{HTTP_USER_AGENT} ^Email(Siphon|Smartz|Wolf)$ [OR]
RewriteCond %{HTTP_USER_AGENT} "^Franklin\ (Box|Locator)"
RewriteRule .* - [F]

This will block the following user agents:

  • ... Email Extractor ...
  • EmailSiphon
  • EmailSmartz
  • EmailWolf
  • Franklin Box ...
  • Franklin Locator ...

The ... in the above list indicates that the user agent string could continue in that direction and still be a match.

When creating a RewriteCond you need to pay attention to: quotes, escaping characters (spaces, periods, ...) and whether the pattern needs to be anchored at the start (^) or end ($).


   
Quote
Topic Tags
Share: