2

I'm trying to rewrite the below URL but the URLs just don't change, no errors.

Current URL:

https://example.com/test/news/?c=value1&s=value2&id=9876

Expected URL:

https://example.com/test/news/value1/value2

My .htaccess

RewriteEngine On
RewriteRule ^test/news/([^/]*)/([^/]*)$ /test/news/?c=$1&s=$2&id=1 [L]
MrWhite
  • 43,179
  • 8
  • 60
  • 84
Seb
  • 93
  • 3
  • 14
  • You don't use `.htaccess` to change the URL. The rule you posted allows the "expected URL" to work. You should be linking to `/test/news/value1/value2`. You need to actually change the URL in your application... the URLs you are linking to. (Ok, you can change the URLs in `.htaccess` to help with SEO if you are changing an existing URL structure, but that is secondary to get your app working, otherwise it will be bad for SEO.) See the following answer/question: https://stackoverflow.com/a/67254596/369434 – MrWhite Apr 23 '22 at 17:25
  • How is `/test/news/?c=value1&s=value2&id=9876` currently being routed? What file actually handles the request? That URL does not look like a valid end-point? – MrWhite Apr 23 '22 at 17:29
  • Thanks for your reply, but I've seen many articles where a url such as http://www.example.com/display_article.php?articleId=my-article can be rewritten as http://www.example.com/articles/my-article/ for example with.htaccess. – Seb Apr 24 '22 at 07:49

1 Answers1

0

but I've seen many articles where a url such as example.com/display_article.php?articleId=my-article can be rewritten as example.com/articles/my-article for example with .htaccess

But the important point here (that I think you are missing) is that the URL must already have been changed internally in your application - in all your internal links. It is a common misconception that .htaccess alone can be used to change the format of the URL. Whilst .htaccess is an important part of this, it is only part of it.

Yes, you can implement a redirect in .htaccess to redirect from the old to new URL - and this is essential to preserve SEO (see below), but it is not critical to your application working. If you don't first change the URL in your internal links then:

  1. The "old" URL is still exposed in the HTML source. When a user hovers over or copies the link, they are seeing and copying the "old" URL.

  2. Every time a user clicks one of your internal links they are externally redirected to the "new" URL. This is slow for your users, bad for SEO (you should never link to a URL that is redirected) and bad for your server, as it potentially doubles the number of requests hitting your server (OK, 301s are cached locally).

To quote from @IMSoP's answer to this reference question on the subject:

Rewrite rules don't make ugly URLs pretty, they make pretty URLs ugly


So, once you have changed your internal links to the "new" (expected) format, eg. /test/news/value1/value2 (or should that be /test/news/value1/value2/id or even /test/news/id/value1/value2? See below), then you can do as follows...

RewriteRule ^test/news/([^/]*)/([^/]*)$ /test/news/?c=$1&s=$2&id=1 [L]

This internally rewrites a request from /test/news/<value1>/<value2> to /test/news/?c=<value1>&s=<value2>&id=1. However, there are a couple of issues with this:

  1. /test/news/ is not itself a valid endpoint. This requires further rewriting. Perhaps you are serving a DirectoryIndex document (eg. index.php)? This might appear seamless to you, but this requires an additional internal subrequest and makes the rule dependent on other elements of the config. You should rewrite directly to the file that handles the request. eg. /test/news/index.php?c=<value1>&s=<value2>&id=1 (remember, this is entirely hidden from the user).

  2. You are hardcoding the id=1 parameter? Should every URL have the same id? Or should this be passed in the "new" URL (which is what I would expect)? What does the id represent? If this is critical to the routing of the URL then the id should appear earlier in the URL-path, in case the URL gets accidentally truncated when copy/pasted/shared.

    If the id is required then it needs to be passed in the "new" URL. We only have the "new" URL to route the request, so the information can't be hidden.

So, if the "new" URL is now /test/news/<id>/<value1>/<value2> then the rewrite would need to be like this instead:

# Rewrite new URLs to old/actual URL
# "/test/news/<id>/<value1>/<value2>" to "/test/news/?c=<value1>&s=<value2>&id=<id>"
RewriteRule ^test/news/(\d+)/([^/]+)/([^/]+)$ /test/news/?c=$2&s=$3&id=$1 [L]

Then (optionally*1) you can implement an external redirect in order to preserve SEO. This is for search engines that have indexed the "old" URLs or third party inbound links that cannot be updated - these need to be corrected to inform search engines of the change and get the user on the "new" canonical URL having followed an out-of-date inbound link.

(*1 It's not "optional" if you are changing an existing URL, but optional with regards to your application being functional.)

This "redirect" goes before the above rewrite:

# Redirect old URLs to the new "canonical" URL
# "/test/news/?c=<value1>&s=<value2>&id=<id>" to "/test/news/<id>/<value1>/<value2>"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^c=([^&]+)&s=([^&]+)&id=(\d+)
RewriteRule ^test/news/$ /$0%3/%1/%2 [QSD,R=301,L]

The $0 backreference contains the full match from the RewriteRule pattern, ie. test/news/ in this case - this simply saves repetition.

The %1, %2 and %3 backreferences contain the values captured from the preceding condition. ie. the values of the c, s and id URL parameters respectively.

Note that the URL parameters / path segments should not be optional as in your original directive (ie. ([^/]*)). If they are optional and they are omitted, then the resulting URL becomes ambiguous. eg. <value2> becomes <value1> if <value1> is omitted.

Note that the URL parameters must be in the order as stated. If you have a mismatch of "old" URLs with these params in a different order (or even intermixed with other params) then this can be accounted for with additional complexity. (It may be easier to perform this redirect in your server-side script, instead of .htaccess.)

The first condition that checks against the REDIRECT_STATUS environment variable ensures that we only redirect direct requests and not rewritten requests by the later rewrite (which would otherwise result in a redirect loop). An alternative on Apache 2.4 is to use the END flag on the RewriteRule instead.

The QSD flag (Apache 2.4) discards the original query string from the request.

You should test first with a 302 (temporary) redirect to avoid potential caching issues and only change to a 301 (permanent) redirect once you have tested that everything works as intended. 301s are cached persistently by the browser so can make testing problematic.


Summary

Your complete .htaccess file should look something like this:

Options -MultiViews +FollowSymLinks

# If relying on the DirectoryIndex to handle the request
DirectoryIndex index.php

RewriteEngine On

# Redirect old URLs to the new "canonical" URL
# "/test/news/?c=<value1>&s=<value2>&id=<id>" to "/test/news/<id>/<value1>/<value2>"
RewriteCond %{ENV:REDIRECT_STATUS} ^$
RewriteCond %{QUERY_STRING} ^c=([^&]+)&s=([^&]+)&id=(\d+)
RewriteRule ^test/news/$ /$0%3/%1/%2 [QSD,R=301,L]

# Rewrite new URLs to old/actual URL
# "/test/news/<id>/<value1>/<value2>" to "/test/news/?c=<value1>&s=<value2>&id=<id>"
RewriteRule ^test/news/(\d+)/([^/]+)/([^/]+)$ /test/news/?c=$2&s=$3&id=$1 [L]
MrWhite
  • 43,179
  • 8
  • 60
  • 84
  • Thanks for the explanations! The ids come from a database and are different for each urls. If the Ids can't be hidden, how can I put it at the end of the url? e.g. "/test/news///" I've tried this ^crypto/news/([^/]+)/([^/]+)/(\d+)$ but it's not working. That's right this is the end point /test/news/index.php. – Seb Apr 26 '22 at 06:11
  • ok got it. I'll add the index.php – Seb Apr 26 '22 at 06:52