-2

I am currently learning Python and was playing around with regex. I have noticed that I can't make sense of using regex non-greedy, if it isn't in or before the end of a pattern.

x = "From someone.name@gmail.com Sat Jan  5 09:14:16 2008"
y = re.findall('\S+?@\S+' , x)

This would give me:

someone.name@gmail.com

Whereas this:

x = "From someone.name@gmail.com Sat Jan  5 09:14:16 2008"
y = re.findall('\S+@\S+?' , x)
or
y = re.findall('\S+?@\S+?' , x)

Would be:

someone.name@g

So is there any point in using non-greedy regex if it isn't the in or before the end of a pattern?

Confucii
  • 3
  • 3
  • There is a difference. `\S+@` will match until the last occurrence of `@` as it is greedy, this part `\S+?@` will match until the first occurrence of `@` as it is non greedy. The part after `@` is also non greedy `\S+?`, which means match 1 or more times as least 1 char as possible giving you only the `g` See the difference in matches here using both patterns https://regex101.com/r/qjp8rQ/1 Note that `\S` can also match an `@` itself. – The fourth bird Sep 16 '20 at 14:10
  • 1
    Thank you, now it makes perfect sense! – Confucii Sep 16 '20 at 14:42

1 Answers1

0

They make sense when something comes after them in the pattern. For example, compare

p1 = re.compile(r'a.*?b')
p2 = re.compile(r'a.*b')

x = 'abb'
p1.match(x).group() # = 'ab'
p2.match(x).group() # = 'abb'

More concretely, they’re useful if you want to exclude a delimiter. For example, to match text between quotes you could write

pattern = r'"[^"]*"'

Or you could write

pattern = r'".*?"'
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214