0

After reading about and trying regular expressions for hours now I believe it's time to ask for some help..

We recently migrated a fairly large website by importing old articles into another CMS and new database. The URL structure is quite different.

The way the old URL was structured is http://wwww.myurl.com/categoryOLD/article_12345.fixed/this_is_the_title.html

the part that says /article_12345.fixed/ is something that exists in all the old URL's just as the .html part at the end. 12345 is the ID of the entry an is different for every URL. I believe we can use this to identify the URLs that need rewriting.

The old URL needs to be rewritten to http://wwww.myurl.com/categoryNEW/this-is-the-title

So the /article_12345.fix/ is gone, the old category is rewritten to a new category, the .html at the end is gone and the underscores need to become hyphens.

I have been reading and trying but I can't even get the RewriteCond to match. Is there anyone out there that dreams in regular expressions and can help me out here?

Taeke
  • 5
  • 4

3 Answers3

1

To rewrite the underscores to hyphens you could use the [N] flag of RewriteRule.

RewriteRule ^([^_]+)_(.*) $1-$2 [N,DPI]

Place this before your other rules, so they work with the new URL.

Then the RewriteRule to rewrite the rest could look like this:

RewriteRule ^\/?(.*?)\/(.*?)\/(.*)\.html$ $1/$3

That's just for the basic rewriting, for ID based rewriting the RewriteMap is probably the best solution, as mentioned by Max Leske.

Milananas
  • 116
  • 2
  • 8
  • thanks! I have to go to meeting now but will use your and @max-leske tips to continue – Taeke May 26 '13 at 10:43
  • I tested this rule against URL's like the above example (with multiple URL parts in it), and it resulted in an endless rewrite loop. I started logging and reading and saw that it's a known bug in the rewrite engine. More about this bug is here: http://stackoverflow.com/questions/439218/mod-rewrite-add-path-info-postfix – Milananas May 26 '13 at 12:30
  • Sorry, had little time to comment before, more complete explanation: when testing with multipart URL's (/part_1/part_2/etc.) that last part was added to the new rewrite string every time. So /a/b/c resulted in /a/b/c/c. Because of the [N] flag, this process was repeated every 'iteration', which caused the endless rewrite loop. The DPI flag is created to solve this problem. It 'tells' the engine it has already appended the part. – Milananas May 26 '13 at 13:14
0

First: RewriteCondition is not required (see http://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritecond), so maybe try rewriting directly first and use the condition later, if you need to, to control the flow of matches better.

category

I'd use a rewrite map to rewrite your categories, since you'll probably need one entry per category, unless you can come up with a transformation rule.

id

Include the id in a match group and simply don't use that group in the output:
RewriteRule ^/.*/(<regex for id>/)(.*)$ http://myurl.com$2(only use second match group)

.html

Same as for id: don't use the match for .html

putting it together

So your config / .htaccess could look something like this:

RewriteEngine On
RewriteMap examplemap txt:/path/to/file/map.txt
RewriteRule ^/.*/([\w\d]+/)(.*)$ ${examplemap:$2}

RewriteRule ^/.*/([article_[\d]+\.fix)(.*)$ http://myurl.com$2
RewriteRule ^/(.*/)(\w+)_(\w+)(.*)$ http://myurl.com$1$2-$3 [N]
Max Leske
  • 5,007
  • 6
  • 42
  • 54
  • So from what I understand, the map.txt holds my old vs new categories, new line for every conversion. Then the rules would look like this? RewriteMap examplemap txt:/path/to/file/map.txt RewriteRule ^/.*/([a-zA-Z0-9]/)(.*)$ ${examplemap:$2} RewriteRule ^/.*/(article_[0-9].fix/)(.*)$ http://myurl.com$2 Not sure what to do with RewriteRule ^/(.*)()$ http://myurl.com$1 as all underscores need to be replaced by hyphens.. – Taeke May 26 '13 at 10:34
  • Looks about right. [See this answer][http://stackoverflow.com/questions/1279681/mod-rewrite-replace-underscores-with-dashes] for a possible solution to replacing underscores with dashes. – Max Leske May 26 '13 at 11:59
  • I've added possible (untested) regex patterns, the last one's inspired by the link I posted above. – Max Leske May 26 '13 at 12:07
0

I ended up using the following:

RewriteEngine On
RewriteRule ^([^_]+)__(.*)\.html$ $1_$2.html [N,DPI]
RewriteRule ^([^_]+)_(.*)\.html$ $1-$2.html [N,DPI]
RewriteRule ^(.*)cat1old/(.*)\.html$ $1cat1new/$2.html
RewriteRule ^(.*)cat2old/(.*)\.html$ $1cat2new/$2.html
RewriteRule ^(.*)cat3old/(.*)\.html$ $1cat3new/$2.html
RewriteRule ^(.*)cat4old/(.*)\.html$ $1cat4new/$2.html
RewriteRule ^\/?(.*)\/(.*)\/(.*)(-?)\.html$ $1/$3 [R=301,L]

Works like a charm! Thanks for your help guys

Taeke
  • 5
  • 4