7

I'm struggling to fix an issue with 301 redirects and .htaccess. I have moved a site from an old domain to a new domain. And I have successfully managed to do this with a 301 redirect. Like so:

Redirect 301 / https://newdomain.com

On the old site child category URLs are like this:

olddomain.com/product-category/parent-cat1/parent-cat2/child-cat

or

olddomain.com/product-category/parent-cat1/child-cat

or

olddomain.com/product-category/child-cat

Whereas on the new site they are:

newdomain.com/product-category/child-cat

Unfortunately, this is resulting in 404s from the redirects. Is there any way to remove the parent categories (which can vary by name and amount of them) from the URL?

MrWhite
  • 43,179
  • 8
  • 60
  • 84
noelmcg
  • 1,057
  • 3
  • 21
  • 44
  • 1
    "`/parent-cat/parent-cat/`" - Are these two instances of `parent-cat` the same? Or is that really `/parent-cat1/parent-cat2/`? You say the number of `parent-cat` can vary... from 1 to how many? What characters are part of the `product-category` and `child-cat`? – MrWhite May 23 '17 at 23:23
  • Sorry for not being clearer. No they would be different parent categories. I will edit the question to clarify this. There is no limit as to how far the product categories could be nested, but practically speaking it isn't more than 5 or 6 levels. Alphanumeric characters and hyphens. Thanks – noelmcg May 24 '17 at 11:25

1 Answers1

6

Try including the following RedirectMatch directive before your existing Redirect directive:

RedirectMatch 302 ^/([\w-]+)/(?:[\w-]+/)+([\w-]+)$ https://newdomain.com/$1/$2

The RedirectMatch directive is complementary to the Redirect directive, both part of mod_alias. Except the RedirectMatch directive uses regex to match the URL-path, whereas Redirect uses simple prefix-matching.

This assumes that the path segments (ie. "product-category", "parent-cat" and "child-cat") consist of just the characters a-z, A-Z, 0-9, _ and - (hyphen). This needs to be as specific as possible so as not to match "too much". One or more "parent-cat" are required.

$1 is a backreference to the first captured group in the pattern. ie. ([\w-]+), the product-category. And $2 is a backreference to the second captured group, ie. ([\w-]+) at the end of the pattern, the child-cat. The (?:....) "group" in the middle is a non-capturing group, so there is no backreference that applies to this.

This is a 302 (temporary) redirect. Change it to a 301 only when it is working OK. It is easier to test with 302s since they are not cached by the browser. Consequently, you'll need to make sure your browser cache is clear before testing.

MrWhite
  • 43,179
  • 8
  • 60
  • 84
  • 1
    Thanks @user82217. Cheers for the heads up with regards to 302, never knew that. It appears to be working fine at the moment just need to do a bit more testing. – noelmcg May 24 '17 at 11:36
  • To complicate matters, I never mentioned that the site was in a sub directory, but this appears to work: RedirectMatch 302 ^/sub-dir/([\w-]+)/(?:[\w-]+/)+([\w-]+)$ https://newdomain.com/$1/$2 – noelmcg May 24 '17 at 11:39
  • That should be sufficient if the site is in a subdirectory. So, presumably your existing `Redirect` directive is really something like: `Redirect 301 /subdir https://newdomain.com`? – MrWhite May 24 '17 at 11:48
  • Yes that correct. Apologies again, I should have been a lot clearer with my initial question. I think the bounty will be heading in your direction! – noelmcg May 24 '17 at 11:56
  • yes have tested the above and it doesnt quite work with urls with 2+ parent categories. Seems to work fine with 1 parent category and removes it as hoped for – noelmcg May 26 '17 at 21:22
  • It seems to work OK for me (tried `/product-category/foo/bar/baz/zop/child-cat` and it successfully redirects to `/product-category/child-cat` at the newdomain). Make sure your browser cache is cleared. What happens exactly - literally nothing? Do you have any other directives in this `.htaccess` file on `olddomain.com`? What's the exact URL you are requesting? Maybe there's some "different" chars in the URL? – MrWhite May 26 '17 at 22:25
  • There are hyphens "-" in the parent and child categories, would this cause an issue? I've tried different browsers, incognito windows etc ... and it hasn't worked. The redirect is the first directive, so that should take complete precedence? – noelmcg May 26 '17 at 23:30
  • "The redirect is the first directive..." - Presumably you mean the `RedirectMatch` directive is the _first_ and `Redirect` is second? "..., so that should take complete precedence I think?" - Not necessarily. Not if you have directives from different modules. Different _modules_ execute at different times, regardless of the apparent order of the directives in the config file. eg. mod_rewrite (`RewriteRule`) executes _before_ mod_alias (`RedirectMatch`), regardless of the apparent order of these directives in the config file. Hyphens are OK and are included in the above patterns. – MrWhite May 26 '17 at 23:39
  • Sorry, yes meant the RedirectMatch. That is then followed by the Redirect. Oh right, I thought htaccess instructions were performed 1 after the other. So once the first redirect is carried out that would be it. Yep, there are different directives on the olddomain. Could I move this directive into a conf file to give it precedence? – noelmcg May 27 '17 at 00:15
  • It depends what these other directives are. And whether there are any other `.htaccess` files along the filesystem path? Do you have access to the server config? Are you still serving content from the site root? Please add the contents of this `.htaccess` file to your question. – MrWhite May 27 '17 at 08:52