-1

We are removing two sections from our site.

/warehouse/
/clothing/

I'd like to send all the URLS beneath these two to a single (404) landing page saying the item has been removed. I'd like to clean up the query strings too if possible.

Where do I start?

Andreas
  • 23,610
  • 6
  • 30
  • 62
Pete D
  • 311
  • 3
  • 15
  • If you're looking to test regex against a sample string, I would suggest [regex101](https://regex101.com/) – quackenator May 12 '17 at 13:53
  • 1
    Is this Apache/nginx/IIS? or in code? do you want to redirect to a new target, or want to keep the URLs in tact but just change the response to the 404 with a specific message? – Doqnach May 12 '17 at 14:19
  • Apache is running with nginx. Hmm - keep the URLs in tact but just change the response to the 404 sounds good for handling search engines. – Pete D May 15 '17 at 11:21
  • You want to configure the server on the outmost layer to serve an HTTP 404 for these urls. How you do that depends on which server it is. – Aaron May 19 '17 at 13:51
  • 7
    HTTP 410 (Gone) seems more appropriate than a 404 btw, as it acknowledge that a ressource was previously here but has been removed – Aaron May 19 '17 at 14:03

2 Answers2

1

First, I'd recommend that you redirect to a 410 (Gone) rather than a 404 to acknowledge that the resource once existed.

In Apache, you'd do something like the following. Refer to this page for more information.

RedirectMatch permanent "^/(warehouse|clothing)/?.*" "http://www.example.com/404"

In IIS, your web config would look something like the following. Note that IIS won't let you use question marks in your regex, since it interprets that as a query string. Refer to this page for more information.

<?xml version="1.0" encoding="UTF-8"?>
   <configuration>
    <system.webServer>
    <rewrite>
    <rules>
        <rule name="404 Redirect" stopProcessing="true">
                    <match url="^/(warehouse|clothing)/" />
            <action type="Redirect" url="404" appendQueryString="true" redirectType="Permanent" />
            <conditions trackAllCaptures="true"></conditions>
        </rule>
   </rules>
   </rewrite>
        <httpProtocol allowKeepAlive="false" />
        <caching enabled="false" />
        <urlCompression doDynamicCompression="true" />
  </system.webServer>
</configuration>

Updated to include ^/ at the beginning of the regex based on drdaeman's comment.

S. Hooley
  • 284
  • 3
  • 9
  • I'm not sure about IIS, but I believe Apache uses PCRE, so it should be parentheses, not square brackets. Also, a leading `^/` should be present, see https://httpd.apache.org/docs/2.4/mod/core.html#locationmatch (quoting: "If the intent is that a URL **starts with** `/extra/data`, rather than merely **contains** `/extra/data`, prefix the regular expression with a `^` to require this.") (Yes, it's the documentation about `LocationMatch`, but IIRC the same principle still applies to `AliasMatch` and `RedirectMatch`) – drdaeman May 25 '17 at 17:33
1

If you're using nginx, you can just add a pair of location sections. They'll match as long as there aren't more specific locations. Check out the documentation for more detail.

location /warehouse/ {
    return 410;
}

location /clothing/ {
    return 410;
}

If there are too many locations, it could be cumbersome to list them separately, so you can use regex like this:

location ~* ^/(warehouse|clothing|something-else)/ {
    return 410;
}

If you want a customized 410 page, add configuration like this in your server block:

error_page 410 /410.html;
location = /410.html {
    root /var/www/error/;    # Put a file /var/www/error/410.html
    internal;
}

Replace 410 with 404 if you want to return that status code. I believe 410 "Gone" is more appropriate answer, but YMMV.

I'd suggest to do this in whatever is closer to the client, so if nginx is in front of Apache - do it with nginx. This way you have less round-trips.

If you want to do this in Apache, you can do it with RedirectMatch:

// I'm not sure `.*$` part is even necessary. Can be probably omitted.
RedirectMatch gone "^/(warehouse|clothing)/.*$" "/410.html"

Or I'd suggest to use mod_rewrite as a somewhat more flexible option:

RewriteEngine on
RewriteRule ^/(warehouse|clothing)/ - [G,L]
ErrorDocument 410 /410.html

Here [G] means "gone" (410 status code). If you want a 404 response, do this instead:

RewriteEngine on
RewriteRule ^/(warehouse|clothing)/ - [R=404,L]

Note, that you need ^/ in your regexes to indicate that path not just contains /warehouse/ or /clothing/ but starts with those. Otherwise you'll see suposedly incorrect responses on addresses like /about/clothing/. I'm not exactly sure if you need trailing .*$, but I believe you don't. Don't have Apache to test this. Add it if rules don't work for you (i.e. ^/(warehouse|clothing)/.*$).

Or you can handle the logic in your application - which can be the only way if your base layout contains something user-dependent and you want consistency. No answer could be written without knowing what language/framework/stack do you use.

drdaeman
  • 11,159
  • 7
  • 59
  • 104