1

I am basically trying to remove the last part of a URL if the URL contains the path /ice/flag/. Example:

Input:

https://test.com/plants/ice/flag/237468372912873

Desired Output:

Because the above URL has /ice/flag/ in its path, I want the last part of the URL to be replaced with redacted.

https://test.com/plants/ice/flag/redacted

However, if the URL did not have /ice/flag (ex: https://test.com/plants/not_ice/flag/237468372912873), it shouldn't be replaced.


What I tried to do is to use the answer mentioned here to change the last part of the path:

var url = 'https://test.com/plants/ice/flag/237468372912873'
url = url.replace(/\/[^\/]*$/, '/redacted')

This works in doing the replacement, but I am unsure how to modify this so that it only matches if /ice/flag is in the path. I tried putting \/ice\/flag in certain parts of the regex to change the behavior to only replace if that is in the string, but nothing has been working. Any tips from those more experienced with regex on how to do this would be greatly appreciated, thank you!


Edit: The URL can be formed in different ways, so there may be other paths before or after /ice/flag/. So all of these are possibilities:

Input:

  • https://test.com/plants/ice/flag/237468372912873
  • https://test.com/plants/extra/ice/flag/237468372912873
  • https://test.com/plants/ice/flag/extra/237468372912873
  • https://test.com/plants/ice/flag/extra/extra/237468372912873
  • https://test.com/plants/ice/flag/extra/237468372912873?paramOne=1&paramTwo=2#someHash

Desired Output:

  • https://test.com/plants/ice/flag/redacted
  • https://test.com/plants/extra/ice/flag/redacted
  • https://test.com/plants/ice/flag/extra/redacted
  • https://test.com/plants/ice/flag/extra/extra/redacted
  • https://test.com/plants/ice/flag/extra/redacted?paramOne=1&paramTwo=2#someHash
inhwrbp
  • 569
  • 2
  • 4
  • 17

3 Answers3

2

You may search for this regex:

(\/ice\/flag\/(?:[^?#]*\/)?)[^\/#?]+

and replace it with:

$1redacted

RegEx Demo

RegEx Breakup:

  • (: Start capture group #1
    • \/ice\/flag\/: Match /ice/flag/
    • (?:[^?#]*\/)?: Match 0 or more of any char that is not # and ? followed by a / as an optional match
  • ): End capture group #1
  • [^\/#?]+ Match 1+ of any char that is not / and # and ?

Code:

var arr = [
    'https://test.com/plants/ice/flag/237468372912873', 
    'https://test.com/plants/ice/flag/a/b/237468372912873',
    'https://test.com/a/ice/flag/e/237468372912873?p=2/12#aHash',
    'https://test.com/plants/not_ice/flag/237468372912873'];

var rx = /(\/ice\/flag\/(?:[^?#\n]*\/)?)[^\/#?\n]+/;
var subst = '$1redacted';

arr.forEach(el => console.log(el.replace(rx, subst)));
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Sorry, I updated my post at the bottom. I think this is very close to what I was hoping to be able to do, but the issue is that sometimes the path can have stuff before or after the `/ice/flag` in which case this would not work (ex: `https://test.com/plants/ice/flag/hi/237468372912873`). Do you think something like this is still possible with regex? – inhwrbp Sep 08 '22 at 19:11
  • 1
    Yes I was in the middle of editing my answer based on your recent edit. Please check it now, it should work. – anubhava Sep 08 '22 at 19:13
  • Sorry to comment again @anubhava, I think there's one last issue with this approach is that if there is some query params `?param` or url hash `#someHash` at the end of the URL (updated my post). I am currently trying to find a way to handle this case with the current implementation you've provided and I can comment here if I find out how to do so – inhwrbp Sep 09 '22 at 00:04
  • 1
    Have a look at my updated answer now. btw you didn't need to unmark the answer to get an update from me. I always attend all the comments posted on my answers. – anubhava Sep 09 '22 at 04:32
  • 1
    You're right @anubhava and I apologize. Thanks so much for your help and I really appreciate you detailing out what the pieces of the regex do, I have learned a lot. – inhwrbp Sep 09 '22 at 16:19
  • One question I have about your solution – what exactly does the `$1` signify in the `subst`? I wasn't really able to find anything about that kind of syntax when Google searching – Saad Sep 09 '22 at 16:23
  • 1
    `$1` represents substring that is captured by 1st capture group, which is `(\/ice\/flag\/(?:[^?#\n]*\/)?)` in this regex. – anubhava Sep 09 '22 at 16:32
1

Here is functional code with test input strings based on your requirements:

const input = [
  'https://test.com/plants/ice/flag/237468372912873',
  'https://test.com/plants/extra/ice/flag/237468372912873',
  'https://test.com/plants/ice/flag/extra/237468372912873',
  'https://test.com/plants/ice/flag/extra/extra/237468372912873',
  'https://test.com/plants/ice/flag/extra/237468372912873#someHash',
  'https://test.com/plants/ice/flag/extra/237468372912873?paramOne=1&paramTwo=2#someHash',
  'https://test.com/plants/not_ice/flag/237468372912873'
];
const re = /(\/ice\/flag\/([^\/#?]+\/)*)[^\/#?]+/;

input.forEach(str => {
  console.log('str: ' + str + '\n  => ' + str.replace(re, '$1redacted'));
});

Output:

str: https://test.com/plants/ice/flag/237468372912873
  => https://test.com/plants/ice/flag/redacted
str: https://test.com/plants/extra/ice/flag/237468372912873
  => https://test.com/plants/extra/ice/flag/redacted
str: https://test.com/plants/ice/flag/extra/237468372912873
  => https://test.com/plants/ice/flag/extra/redacted
str: https://test.com/plants/ice/flag/extra/extra/237468372912873
  => https://test.com/plants/ice/flag/extra/extra/redacted
str: https://test.com/plants/ice/flag/extra/237468372912873#someHash
  => https://test.com/plants/ice/flag/extra/redacted#someHash
str: https://test.com/plants/ice/flag/extra/237468372912873?paramOne=1&paramTwo=2#someHash
  => https://test.com/plants/ice/flag/extra/redacted?paramOne=1&paramTwo=2#someHash
str: https://test.com/plants/not_ice/flag/237468372912873
  => https://test.com/plants/not_ice/flag/237468372912873

Regex:

  • ( - capture group start
  • \/ice\/flag\/ - expect /ice/flag/
  • ([^\/#?]+\/)* - zero or more patterns of chars other than /, #, ?, followed by /
  • ) - capture group end
  • [^\/#?]+ - discard anything that is not /, #, ? but expect at least one char (this will force stuff after the last /)
Peter Thoeny
  • 7,379
  • 1
  • 10
  • 20
0

You can add a ternary operation condition to check if the url includes /ice/flag by url.includes('/ice/flag'), then replace url.replace(/\/[^\/]*$/, '/redacted') else return the url as it is.

function replace(url) {
  return url.includes('/ice/flag') ? url.replace(/\/[^\/]*$/, '/redacted') : url;
}

console.log(replace("https://test.com/plants/ice/flag/237468372912873"))
console.log(replace("https://test.com/plants/not_ice/flag/237468372912873"));
Mina
  • 14,386
  • 3
  • 13
  • 26