2

A note of caution: avoid reading my string here, I will bold the interesting parts, I know it is a mess.

I am trying to find all substrings in

"jashdgfkldsuvha sjdgjhayaluiegfasdgfguiasdgo"

which start with the character "a" and end with the character "l" (that is a lowercase L). Specifically, I want a non-greedy search.

If I run

re.findall("(a.*?l)", "jashdgfkldsuvha sjdgjhayaluiegfasdgfguiasdgo")

it returns:

['ashdgfkl', 'a sjdgjhayal']

but notice in the string we have the letters "al" in order:

"jashdgfkldsuvha sjdgjhayaluiegfasdgfguiasdgo"

I'm not an expert at regex, so I assumed that maybe, even though * should allow for a match of zero characters, that the match of zero here wasn't working. So let's add some characters between this "al" to get

"jashdgfkldsuvha sjdgjhayajkdluiegfasdgfguiasdgo"

but it still only returns

['ashdgfkl', 'a sjdgjhayajkdl']

What is going on here? Why does regex have a vendetta against this particular "a" and "l"?

Kraigolas
  • 5,121
  • 3
  • 12
  • 37
  • 1
    You could try `re.findall("(?=(a.*?l))", "jashdgfkldsuvha sjdgjhayaluiegfasdgfguiasdgo")` which returns `['ashdgfkl', 'a sjdgjhayal', 'ayal', 'al']` – Nick Feb 12 '21 at 01:33
  • 1
    Or you can use the `regex` module which does support overlapping matches. Both cases are covered in the duplicate. – Nick Feb 12 '21 at 01:39

0 Answers0