3

I've searched several SO posts and haven't found what I'm looking for. It might exists but might be fairly old enough to not show up for me. I found a post (Nginx rewrite: add trailing slash, preserve anchors and query strings) so close to what I need, but it's regex solution does not work for URL Rewrite for IIS, unless I'm doing it wrong.

Problem

I'm trying to add a forward slash / to the end of my url paths while also preserving any existing for query strings ? and anchors #.

Desired Solution

Basically, here's the desired results to each problem:

Entry: https://my.site.com/about
Result: https://my.site.com/about/

Entry: https://my.site.com/about?query=string
Result: https://my.site.com/about/?query=string

Entry: https://my.site.com/about#TestAnchor
Result: https://my.site.com/about/#TestAnchor

Entry: https://my.site.com/about?query=string#TestAnchor
Result: https://my.site.com/about/?query=string#TestAnchor

Current Tests

Our current regex ignores query strings and anchors, but I would like to take them into consideration now.

<rule name="AddTrailingSlash" stopProcessing="true">
  <match url="^([^.?]+[^.?/])$" />
  <action type="Redirect" url="{R:1}/" redirectType="Permanent" />
</rule>

I've also tested another regex but it only works if the url contains both a query string AND an anchor.

<rule name="AddTrailingSlash" stopProcessing="true">
  <match url="^(.*)(\?.*?)(\#.*?)$" />
  <action type="Redirect" url="{R:1}/{R:2}{R:3}" redirectType="Permanent" />
</rule>

NOTE: I just tested this last one (^(.*)(\?.*?)(\#.*?)$) and it actually doesn't work. If the url already contains a / before the ? the test passes which it should not, so I have more work to do here.

Question

Is there a single regex that I can use to solve this or do I need to use multiple rules?

RoLYroLLs
  • 3,113
  • 4
  • 38
  • 57

2 Answers2

2

TL;DR

IIS Rewrite (ALL) URIs with Trailing Slash & preserve Fragment and Query Strings
<rule name="AddTrailingSlash" stopProcessing="true">
  <match url="^([^/]+:\/\/[^/#?]+|[^?#]+?)\/?((?:[^/?#]+\.[^/?#]+)?(?:[?#].*)?$)" />
  <action type="Redirect" url="{R:1}/{R:2}" redirectType="Permanent" />
</rule>

IIS use ECMAScript so you can Try it here : https://regexr.com/6ele7


Update

IIS Rewrite (Considered) URIs with Trailing Slash & preserve Fragment and Query Strings
<rule name="AddTrailingSlash" stopProcessing="true">
  <match url="^([^/]+:\/\/[^/#?]+|[^?#]+\/[^/.?#]+)([?#].*)?$" />
  <action type="Redirect" url="{R:1}/{R:2}" redirectType="Permanent" />
</rule>

Try it here : https://regexr.com/6fk3g


http://127.0.0.1  -->  http://127.0.0.1/
https://localhost  -->  https://localhost/
https://localhost?  -->  https://localhost/?
https://localhost/  -->  https://localhost/
https://my.site.com  -->  https://my.site.com/
https://my.site.com:443?  -->  https://my.site.com:443/?
https://my.site.com/  -->  https://my.site.com/
https://my.site.com/about.php  -->  https://my.site.com/about.php
https://my.site.com/about.php?  -->  https://my.site.com/about.php?
https://my.site.com/about  -->  https://my.site.com/about/
https://my.site.com/about?  -->  https://my.site.com/about/?
https://my.site.com/about/  -->  https://my.site.com/about/
https://my.site.com/about/?  -->  https://my.site.com/about/?
https://my.site.com/about?query  -->  https://my.site.com/about/?query
https://my.site.com/about/?query  -->  https://my.site.com/about/?query
https://my.site.com/about.php?query  -->  https://my.site.com/about.php?query
https://my.site.com/about#hash  -->  https://my.site.com/about/#hash
https://my.site.com/about/#hash  -->  https://my.site.com/about/#hash
https://my.site.com/about.php#hash  -->  https://my.site.com/about.php#hash
https://my.site.com/about?query#hash  -->  https://my.site.com/about/?query#hash
https://my.site.com/about/?query#hash  -->  https://my.site.com/about/?query#hash
https://my.site.com/folder.name/about?query  -->  https://my.site.com/folder.name/about/?query
https://my.site.com/about?query#hash:http://test.com?q  -->  https://my.site.com/about/?query#hash:http://test.com?q

Explaination (All)

  • Level 1 - Lets just think about your examples:
^([^?#]+?)\/?([?#].*)?$

Group #1: ^ In first, [^?#] Any character except ?/#, Go much but lazy +? (Stop on first possible, by looking to next)
Ignore: \/? Then if a / exist or not
Group #2: [?#] = ?/# And .* Any much character next to that till $ End, (...)? If exist

It work well. But it will deal not right with:

https://my.site.com/about.php?query  -->  https://my.site.com/about.php/?query  !!!

So let's add an exception...

  • Level 2 - How if we take possible file name Name.name.name.ext as Group #2?
^([^?#]+?)\/?((?:[^/?#]+\.[^/?#]+)?(?:[?#].*)?)$

(?:...) Non-Capturing group
([^/?#]+\.[^/?#]+)? Look for any possible file name or (?:[?#].*)? Any possible query or anchor strings

Now everything is OK, except this:

https://my.site.com?  -->  https://my.site.com?  !!!

So we need another exception in Group #1

  • Level 3 - Take just domain URI as an alternative
^([^/]+:\/\/[^/#?]+|[^?#]+?)\/?((?:[^/?#]+\.[^/?#]+)?(?:[?#].*)?$)

(...|...) Alternative [^/]+:\/\/[^/#?]+ First check if (not lazy) any pattern like ...://... till not / # ? exist?

Now it work great!


+ Explaination (Considered)

  • Level 4 - How if we just add a Not-Accepting . & / character set in first group to just match considered URIs and ignore others?
^([^/]+:\/\/[^/#?]+|[^?#]+\/[^/.?#]+)([?#].*)?$

\/[^/.?#]+ Check if after last / the set of characters be not /.?#

Now it is even smaller and faster!


Analyzing other method

As @károly-szabó answered well here, instead of looking for Not-Accepted character sets, we can look for matched pattern.
So if we want to use the method but in simpler way (2 Groups) (+ Some minor optimization), the regex will be:

^(https?:\/\/[\w.:-]+\/?(?:[\w.-]+\/)*[\w-]+(?!\/))([?#].*)?$

But URI path Accepted characters are more.

So a wider version of that Regex can be:

^(https?:\/\/[\w.:-]+\/?(?:[\w!#-)+-.;=@~]+\/)*[\w!#-);=@~+,-]+(?!\/))([?#].*)?$

Try it here: https://regexr.com/6elea

Note: Still "multibyte Unicode as domain name is allowed" but i ignored that in this method.


P.S.

Actually i don't think that we should rewrite it on IIS, because of these reasons:

I Mean:

https://my.site.com/  -->  (=Call root)
https://my.site.com/about  -->  (=Call root > Folder/File name about) 
https://my.site.com/about/  -->  (=Call root > Folder name about) 
https://my.site.com/about?query  -->  (=Call root > Folder/File name about + Query)
https://my.site.com/about/?query  -->  (=Call root > Folder name about + Query)
https://my.site.com/about.php?query  -->  (=Call root > File name about.php + Query)
[When browser strip it:]
https://my.site.com/about#hash  -->  (=Call root > Folder/File name about + Anchor)
https://my.site.com/about/#hash  -->  (=Call root > Folder name about + Anchor)
https://my.site.com/about.php#hash  -->  (=Call root > File name about.php + Anchor)

[If not?]
https://my.site.com/folder#name/?query#hash
https://my.site.com/folder.name/about.php?query=one/two
MMMahdy-PAPION
  • 915
  • 10
  • 15
  • This looks great and I love the extended explanations, caveats and links to resources. However, it seems to match every url. I'd like for it to not match any urls that are already properly formatted. @károly-szabó does that, but seems to not allow a dash in the url, so I'm looking to use that one if fixed. – RoLYroLLs Feb 02 '22 at 16:56
  • @RoLYroLLs thanks, that idea made the regex more simple, i updated the answer. – MMMahdy-PAPION Feb 03 '22 at 16:53
  • Thanks for the update and sorry for the delay as I was out of town. Now something is not right. I seems to not match one of the lines properly, as if it's matching 2 lines in one. https://imgur.com/a/MiMTLX6 – RoLYroLLs Feb 16 '22 at 15:46
  • @RoLYroLLs Nothing is wrong. Actually in URIs you will never have a multi-line input, i use the multi-line sample to you can see examples next to each other, just for view. I could avoid of next-line characters by adding `\n` in the negative groups (``^([^/]+:\/\/[^/#?\n]+|[^?#\n]+\/[^/.?#\n]+)([?#].*)?$``), but i prefer handle the view by rearrange the examples next to each other to do not make the regex dirty by temporary `\n`. – MMMahdy-PAPION Feb 17 '22 at 03:54
1

You can try with this regex https://regex101.com/r/6TSqaP/2. This is matching every provided example and solves the problem if the url already has an ending '/'.

^((?:https?:\/\/[\w\.\-]*)(?:[\w\-]+\/)*(?:[\w\-]+)(?!\/))(\?.*?)?(\#.*?)?$

I used your second example as base for my regex, with the following logic. The parts of the url: scheme://authority/path?query#fragment

  1. first capture group matches the scheme://authority/path part of the url
  2. second capture group optional and matching the ?query
  3. third capture group also optional and for the #fragment

regex explanation

^(                            # should start with this
    (?:https?:\/\/[\w\.\-]*)  # match the http or https protocol and the domain
    (?:[\w\-]+\/)*            # match the path except the last element of it (optional)
    (?:[\w\-]+)(?!\/)         # match the last path element, but only if it's not closed with '/'
)                             # {R:1}
(\?.*?)?                      # {R:2} query (optional)
(\#.*?)?                      # {R:3} fragment (optional)
$                             # string should end

Nginx

<rule name="AddTrailingSlash" stopProcessing="true">
  <match url="^((?:https?:\/\/[\w\.\-]*)(?:[\w\-]+\/)*(?:[\w\-]+)(?!\/))(\?.*?)?(\#.*?)?$" />
  <action type="Redirect" url="{R:1}/{R:2}{R:3}" redirectType="Permanent" />
</rule>

Edit: Updated regex to handle dashes (-) and multiple path elements

Károly Szabó
  • 1,131
  • 9
  • 17
  • After several tests this looks like a great answer, but I need a little help. Our domain name contains a dash `-` in it. Can you help update the regex to allow the domain name to contain dashes? Thank you. – RoLYroLLs Feb 02 '22 at 16:47
  • I assume it would be this `^((?:https?:\/\/[\w\-\.]*)(?:\w+\/)?(?:\w+)(?!\/))(\?.*?)?(\#.*?)?$` (added `\-` near the beginning) – RoLYroLLs Feb 02 '22 at 16:59
  • I've also noticed this does not capture url's with more dashes, ie: `https://my.site.com/about-us` is not captured. – RoLYroLLs Feb 02 '22 at 20:22
  • Last comment: I also noticed this does not capture: `https://my.site.com/about/us` – RoLYroLLs Feb 02 '22 at 20:54
  • I updated my post with a new regex and new link, which matched those cases correctly. – Károly Szabó Feb 03 '22 at 08:14
  • This method is nice, i used this to improve my answer. Also i had some problems with some accepted characters or not necessary things in the pattern but comment limitation not allow me to explain here well, so as another method i analyse it in [my answer](https://stackoverflow.com/a/70941532/7514010). – MMMahdy-PAPION Feb 03 '22 at 17:04