2

The goal of my regular expression adventure is to create a matcher for a mechanism that could add a trailing slash to URLs, even in the presence of parameters denoted by # or ? at the end of the URL.

For any of the following URLs, I'm looking for a match for segment as follows:

  1. https://example.com/what-not/segment matches segment
  2. https://example.com/what-not/segment?a=b matches segment
  3. https://example.com/what-not/segment#a matches segment

In case there is a match for segment, I'm going to replace it with segment/.

For any of the following URLs, there should be no match:

  1. https://example.com/what-not/segment/ no match
  2. https://example.com/what-not/segment/?a=b no match
  3. https://example.com/what-not/segment/#a no match

because here, there is already a trailing slash.

I've tried:

  1. This primitive regex and their variants: .*\/([^?#\/]+). However, with this approach, I could not make it not match when there is already a trailing slash.
  2. I experimented with negative lookaheads as follows: ([^\/\#\?]+)(?!(.*[\#\?].*))$. In this case, I could not get rid of any ? or # parts properly.

Thank you for your kind help!

Dyin
  • 5,815
  • 8
  • 44
  • 69
  • Try `(.*\/[^?#\/]+)([?#][^\/]*)?$` and replace with `$1/$2`, see https://regex101.com/r/M6mKAV/2. I added `\n` o the negated character classes since the example text is a multiline string. – Wiktor Stribiżew Sep 22 '22 at 13:55

2 Answers2

4

Lookahead and lookbehind conditionals are so powerful!

(?<=\/)[\w]+(?(?=[\?\#])|$)

P.s: I just added [\w]+ that means [a-zA-Z0-9_]+.
Of course URLs can contain many other character like - or ~ but for the examples provided it works nicely.

Carapace
  • 377
  • 2
  • 9
1

If you want to match urls, you might use

\b(https?://\S+/)[^\s?#/]+(?![^\s?#])

Explanation

  • \b A word boundary to prevent a partial word match
  • ( Capture group 1
    • https?://\S+/ Match the protocol, 1+ non whitespace chars and then the last occurrence of /
  • ) Close group 1
  • [^\s?#/]+ Match 1+ chars other than a whitespace char ? # /
  • (?![^\s?#]) Negative lookahead, assert that directly to the right is not a non whitespace char other than ? or #

See a regex demo.

In the replacement use group 1 followed by segment/


For a match only instead of a capture group:

(?<=\bhttps?://\S+/)[^\s?#/]+(?![^\s?#])

See another regex demo.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70