Regex: Smallest possible substring match

Question

I have url strings such as:

"https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide_3/"

Now, I need to capture the slide_3 part, more specifically the start position of the digit 3 on constraint that it should be a single digit( neither preceded nor succeeded by any digit) not preceded by an "=". So, pageid=2 shouldn't match while slide_3 should.

I tried this with python regex:

p = re.compile('/.*(?<!=)(?<!\d)\d(?!\d).*/')
s = "https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide_3/"

for m in p.finditer(s):
    print(m.start(), m.group())

and the result is

6 //facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide_3/

I understand why I get this, the first and the last "/" satisfy the regexp but so does the substring "/slide_3/".

How do I make sure I get the smallest substring that matches the regex.

Why doesn't this work:

'/[^/](?<!=)(?<!\d)\d(?!\d).*/'

Non greedy operator .*? does not seem to do the trick since it does not guarantee the shortest possible match.

Strings that should match:

"https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide_3/" 
"https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/sno3/"
"https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/3/"

and the matches should be slide_3 , sno3, 3 respectively

Strings which shouldn't:

"https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide/"
"https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide_33/"
"https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/33/"

@OlvinRoght I have multiple urls like this one, which may or may not contain the said pattern. I need to find the ones which do and further manipulate them — Alzio, Aug 27 '19 at 12:08
If I got it right the pattern is `\w_(\d)` ? After an underscore and without any number after — Plopp, Aug 27 '19 at 12:09
Possible duplicate of [RegEx: Smallest possible match or nongreedy match](https://stackoverflow.com/questions/1919982/regex-smallest-possible-match-or-nongreedy-match) — Uli Sotschok, Aug 27 '19 at 12:09
@UliSotschok IHAd gone through it, didn't work for me, would you be able to produce a regex using the non-greedy operator..? — Alzio, Aug 27 '19 at 12:15
@MonkeyZeus should probably add this in the question, the "slide" string isn't fixed here, it can be anything say "s3" or simply "3" — Alzio, Aug 27 '19 at 12:16
Yes, you should edit your question to include various examples of things which should and should not match. I think `^.*?\/[^\d]*(\d)\/?$` is what you might be looking for — MonkeyZeus, Aug 27 '19 at 12:19

MonkeyZeus · Accepted Answer · 2019-08-27T12:41:05.323

0

If I understand your question then you can use this to check if a string matches your expected pattern:

(?:^.*\/)([^\d]*\d)(?:\/?$)

and \1 will contain:

slide_3
sno3
3

https://regex101.com/r/h0rNdC/4

This could be useful in getting the index of the match: Python Regex - How to Get Positions and Values of Matches

edited Aug 27 '19 at 12:41

answered Aug 27 '19 at 12:28

MonkeyZeus

20,375
4
36
77

I already mentioned I need the smallest substring that matches regex, your answer matches the whole string. – Alzio Aug 27 '19 at 12:36
@Alzio See my edit. You need to make sure to only pay attention to `\1` – MonkeyZeus Aug 27 '19 at 12:41
@Alzio If my answer helped then please feel free to accept it as the correct answer. Thanks. – MonkeyZeus Aug 27 '19 at 12:58

score 0 · Answer 2 · answered Aug 27 '19 at 17:41

You could match the forward slash, then match 0+ times any char except a digit, /, = or a newline.

Capture a single digit in a capturing group and match the trailing forward slash.

To get the start and the end indices of the match, you could for example use re.search which will return a match object.

/[^\d/=\r\n]*(\d)/

regex demo | Python demo

For example

import re

regex = r"/[^\d/=\r\n]*(\d)/"
strings = [
    "https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide_3/",
    "https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/sno3/",
    "https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/3/",
    "https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide/",
    "https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/slide_33/",
    "https://facty.com/ailments/body/10-home-remedies-for-styes/pageid=2/33/"
]

for s in strings:
    matches = re.search(regex, s)
    if matches:
        print ("Group {groupNum} found at {start}-{end} value:{group}".format(groupNum = 1, start = matches.start(1), end = matches.end(1), group = matches.group(1)))

Result

Group 1 found at 74-75 value:3
Group 1 found at 71-72 value:3
Group 1 found at 68-69 value:3

Regex: Smallest possible substring match

2 Answers2