-1

I have this simple regular expression substitution just to ensure that some URL ends in a slash character:

   url = re.sub("/*$", "/", "foo/")

...but it just happens that when I run that code, the result is unexpectedly foo//.

After further experimentation the explanation I have found is that the regular expression /*$ matches the slash at the end of the string and replaces it by another slash, but then it matches again the empty string just after the replaced slash.

Is there any simple way to workaround this issue?

Update: Well, It seems you can tell sub how many times you want the replacement done at most:

   url = re.sub("/*$", "/", "foo/", 1)
salva
  • 9,943
  • 4
  • 29
  • 57

2 Answers2

0

For the sake of a regex answer:

The /*$ matches any number of slashes / at the end of the input, but since the * allows zero repetitions, it also matches the position after the last character in the string, be that a slash or not.

This allows you to append a new slash at the end in the first place, but it will also not prevent adding another slash when there already is one.

And that's the key: You must check that the last character in the string is not a slash. That can be done with a negative look-behind:

(?<!/)/*$

This will replace any number of slashes at the end of the string with a single slash, but without the unnecessary duplication.

...but for all practical purposes, use .rstrip('/') + '/' as @James has suggested in the comments.

Tomalak
  • 332,285
  • 67
  • 532
  • 628
0

If you absolutely do want/need to do it with a regex, you need to match the end of the url with a non-slash (one or more). And then replace that with a slash (and the original ending). So will also need a capture group to get that ending:

re.sub("([^/]+)/*$", r"\1/", "foo")
# 'foo/'
re.sub("([^/]+)/*$", r"\1/", "foo/")
# 'foo/'
re.sub("([^/]+)/*$", r"\1/", "foo//")
# 'foo/'
re.sub("([^/]+)/*$", r"\1/", "foo///////////")
# 'foo/'
re.sub("([^/]+)/*$", r"\1/", "bar/foo//")
# 'bar/foo/'

As mentioned by @James in the comments, you're better off using an easier approach:

'foo/'.rstrip('/') + '/'

which strips off trailing slashes (if there are any) and then adds a slash.

aneroid
  • 12,983
  • 3
  • 36
  • 66