1

I have eCommerce categories with faceted navigation (filtering). Filtering can generate thousands of (useful) URLs. I would like to reduce nr. of possible URLs by showing certain content always on the same URL with the same query string parameter order.

From an SEO point of view I could use the canonical tag to logically eliminate duplicated URLs, but from a performance point of view it would be much better to solve it with RewriteRules.

Example URLs with the same content but different param order:

  • https://example.com/category/subcategory/?filter_manuf=grohe&filter_style=design&filter_family=bauedge&filter_warranty=5y
  • https://example.com/category/subcategory/?filter_style=design&filter_manuf=grohe&filter_warranty=5y&filter_family=bauedge

These URLs should be redirected to an URL in which query params appear always in the same order. eg:

https://example.com/category/subcategory/?filter_manuf=grohe&filter_family=bauedge&filter_style=design&filter_warranty=5y

Note that:

  • I have more than 10 filtering criteria (query params)
  • Order of parameters changes according to the user's filter selection order. They can appear in any given order.
  • Only parameters which are used appear in the URL. Some pages have one or two parameters in their URL, some have up to ten or more.

Do you have any idea how can it be achieved?

I have found something promising in this question, but I can't make it work:
RewriteCond to match query string parameters in any order

MrWhite
  • 43,179
  • 8
  • 60
  • 84
nattila
  • 31
  • 3
  • 1
    "from performance point of view it would be much better to solve it with RewriteRules." - No, it wouldn't. This should be solved in your application at the time you construct the URL. – MrWhite Sep 18 '20 at 23:34
  • Hi MrWhite, Do you mean when links of faceted navigation are constructed? Yes, this idea also came to my mind. By doing it and adding **Canonical tag** to all pages can also resolve this issue. I thought it is possible to do it with Rewrite Rules, but it doesn't seems to be feasible. Thanks. – nattila Sep 20 '20 at 07:13
  • Yes, at the time you construct the canonical tag, use this as the visible URL as well? It's technically possible but inefficient (and complex) to do this with mod_rewrite. You only want to "redirect" the user if they happen to follow an incorrect inbound link. That linked question admits that it's not a working example (it has a number of syntax errors for one). But it also omits some important details, like checking that the URL params are already in the correct order (it uses a different URL-path to avoid this issue). It's an interesting problem, but not particularly practical. – MrWhite Sep 23 '20 at 12:43

1 Answers1

0

but from performance point of view it would be much better to solve it with RewriteRules.

From a performance point of view, it would be far better to resolve this in your application, not .htaccess/mod_rewrite (ie. RewriteRules). You want to always be correctly linking to the canonical URL.

You certainly don't want to be externally redirecting the user as they apply filters in order to "correct" the URL parameter order. The URL parameters should be applied in the "correct" to begin with by your application.

The only time it would be beneficial to "redirect" the user is if they have followed a third party non-canonical link (from another website or search engine) and you need to resolve potential SEO issues. But even then, the code to correct the URL parameter order should be far simpler (and easier to maintain) if implemented as part of your application logic, not .htaccess. The code to do this in .htaccess is comparatively more "complex" (read: messy, potentially harder to maintain, more prone to error, etc.)

However, it is an interesting problem and there might be an occasion when it is preferable (or necessary) to code this in .htaccess (or Apache server config) when you are not able to do this easily in your application.

Solution using mod_rewrite in .htaccess (or server config)

(However, note the comments above - this may not be what you should be doing.)

This is a reasonably generic solution that works in .htaccess (or server config). As it stands, it works on any URL-path. To make it work on a single URL-path (eg. /category/subcategory/, as stated in the question) then modify the pattern in the final RewriteRule directive. For example:

RewriteRule ^category/subcategory/$ %{REQUEST_URI}?%{ENV:NEW_QUERY_STRING} [NE,R=302,L]

Or, you could write an exception at the top to skip these rules for certain URLs if you need to apply it to a group of URLs and not others. This might be more optimal as it avoids any unnecessary processing of the query string.

This block of code would need to go near the top of your .htaccess file. (Order matters.)

This code has the added "benefit" that it also "sanitizes" the query string by removing any URL parameters that are not defined (at the top of the script).

Since it's non-trivial to "simply" determine whether the original URL parameters are already in the correct order, the script goes through the process of constructing a new query string with the URL parameters in the correct order and then compares this to the original query string in order to determine whether a redirect is necessary.

Criteria:

  • Up to 10 URL parameters
  • Any number of URL parameters can appear in any order
  • Empty URL parameters should not be included
  • URL parameters are case-sensitive
  • Works for any URL-path
  • URL parameter names match the regex [\w-]+ (ie. a-z, A-Z, 0-9, _ and -)
  • URL parameter values cannot contain @ (unless URL encoded)
  • @@@ cannot appear anywhere in the query string

You simply need to define the URL parameter names at the top of the script, in the order you wish them to be. These are held in environment variables VAR_NAME_01, VAR_NAME_02, etc. The remainder of the script should work unaltered unless:

  • you need to add more URL parameters
  • OR, change the character used internally to delimit sections in the pattern matching (currently "@").
  • OR, limit the code to a specific URL-path.

Script:

# Define the "name" of each URL parameter
# The numeric order determines the order of the resulting URL parameter list.
# Comment out any URL parameters that are not required.
SetEnvIf ^ ^ VAR_NAME_01=one
SetEnvIf ^ ^ VAR_NAME_02=two
SetEnvIf ^ ^ VAR_NAME_03=three
SetEnvIf ^ ^ VAR_NAME_04=four
SetEnvIf ^ ^ VAR_NAME_05=five
SetEnvIf ^ ^ VAR_NAME_06=six
SetEnvIf ^ ^ VAR_NAME_07=seven
SetEnvIf ^ ^ VAR_NAME_08=eight
SetEnvIf ^ ^ VAR_NAME_09=nine
SetEnvIf ^ ^ VAR_NAME_10=ten

###############################################################################
# Shouldn't need to modify directives below here...

RewriteEngine on
Options +FollowSymLinks

# -----------------------------------------------------------------------------
# Read each URL parameter (if any) and store in corresponding env var

RewriteCond %{QUERY_STRING}@%{ENV:VAR_NAME_01} (?:^|&)([\w-]+)=([^&@]+).*@\1
RewriteRule ^ - [E=VAR_VALUE_01:%2]

RewriteCond %{QUERY_STRING}@%{ENV:VAR_NAME_02} (?:^|&)([\w-]+)=([^&@]+).*@\1
RewriteRule ^ - [E=VAR_VALUE_02:%2]

RewriteCond %{QUERY_STRING}@%{ENV:VAR_NAME_03} (?:^|&)([\w-]+)=([^&@]+).*@\1
RewriteRule ^ - [E=VAR_VALUE_03:%2]

RewriteCond %{QUERY_STRING}@%{ENV:VAR_NAME_04} (?:^|&)([\w-]+)=([^&@]+).*@\1
RewriteRule ^ - [E=VAR_VALUE_04:%2]

RewriteCond %{QUERY_STRING}@%{ENV:VAR_NAME_05} (?:^|&)([\w-]+)=([^&@]+).*@\1
RewriteRule ^ - [E=VAR_VALUE_05:%2]

RewriteCond %{QUERY_STRING}@%{ENV:VAR_NAME_06} (?:^|&)([\w-]+)=([^&@]+).*@\1
RewriteRule ^ - [E=VAR_VALUE_06:%2]

RewriteCond %{QUERY_STRING}@%{ENV:VAR_NAME_07} (?:^|&)([\w-]+)=([^&@]+).*@\1
RewriteRule ^ - [E=VAR_VALUE_07:%2]

RewriteCond %{QUERY_STRING}@%{ENV:VAR_NAME_08} (?:^|&)([\w-]+)=([^&@]+).*@\1
RewriteRule ^ - [E=VAR_VALUE_08:%2]

RewriteCond %{QUERY_STRING}@%{ENV:VAR_NAME_09} (?:^|&)([\w-]+)=([^&@]+).*@\1
RewriteRule ^ - [E=VAR_VALUE_09:%2]

RewriteCond %{QUERY_STRING}@%{ENV:VAR_NAME_10} (?:^|&)([\w-]+)=([^&@]+).*@\1
RewriteRule ^ - [E=VAR_VALUE_10:%2]

# -----------------------------------------------------------------------------
# Construct new query string
# Only with URL parameters that are not empty

RewriteCond %{ENV:VAR_VALUE_01} .
RewriteRule ^ - [E=NEW_QUERY_STRING:%{ENV:VAR_NAME_01}=%{ENV:VAR_VALUE_01}]

RewriteCond %{ENV:VAR_VALUE_02} .
RewriteRule ^ - [E=NEW_QUERY_STRING:%{ENV:NEW_QUERY_STRING}&%{ENV:VAR_NAME_02}=%{ENV:VAR_VALUE_02}]

RewriteCond %{ENV:VAR_VALUE_03} .
RewriteRule ^ - [E=NEW_QUERY_STRING:%{ENV:NEW_QUERY_STRING}&%{ENV:VAR_NAME_03}=%{ENV:VAR_VALUE_03}]

RewriteCond %{ENV:VAR_VALUE_04} .
RewriteRule ^ - [E=NEW_QUERY_STRING:%{ENV:NEW_QUERY_STRING}&%{ENV:VAR_NAME_04}=%{ENV:VAR_VALUE_04}]

RewriteCond %{ENV:VAR_VALUE_05} .
RewriteRule ^ - [E=NEW_QUERY_STRING:%{ENV:NEW_QUERY_STRING}&%{ENV:VAR_NAME_05}=%{ENV:VAR_VALUE_05}]

RewriteCond %{ENV:VAR_VALUE_06} .
RewriteRule ^ - [E=NEW_QUERY_STRING:%{ENV:NEW_QUERY_STRING}&%{ENV:VAR_NAME_06}=%{ENV:VAR_VALUE_06}]

RewriteCond %{ENV:VAR_VALUE_07} .
RewriteRule ^ - [E=NEW_QUERY_STRING:%{ENV:NEW_QUERY_STRING}&%{ENV:VAR_NAME_07}=%{ENV:VAR_VALUE_07}]

RewriteCond %{ENV:VAR_VALUE_08} .
RewriteRule ^ - [E=NEW_QUERY_STRING:%{ENV:NEW_QUERY_STRING}&%{ENV:VAR_NAME_08}=%{ENV:VAR_VALUE_08}]

RewriteCond %{ENV:VAR_VALUE_09} .
RewriteRule ^ - [E=NEW_QUERY_STRING:%{ENV:NEW_QUERY_STRING}&%{ENV:VAR_NAME_09}=%{ENV:VAR_VALUE_09}]

RewriteCond %{ENV:VAR_VALUE_10} .
RewriteRule ^ - [E=NEW_QUERY_STRING:%{ENV:NEW_QUERY_STRING}&%{ENV:VAR_NAME_10}=%{ENV:VAR_VALUE_10}]

# -----------------------------------------------------------------------------
# Trim "&" prefix from the NEW_QUERY_STRING
RewriteCond %{ENV:NEW_QUERY_STRING} ^&(.+)
RewriteRule ^ - [E=NEW_QUERY_STRING:%1]

# Compare with existing QUERY_STRING to determine whether it's in the correct order already
# If different then redirect...
RewriteCond %{QUERY_STRING}@@@%{ENV:NEW_QUERY_STRING} !^(.+)@@@\1
RewriteRule ^ %{REQUEST_URI}?%{ENV:NEW_QUERY_STRING} [NE,R=302,L]

If you have any queries regarding specific parts of this script just say in comments...

MrWhite
  • 43,179
  • 8
  • 60
  • 84