-1

I run the following script

a = r'[abc] [abc] [y78]'
paaa = re.compile(r'\[ab.*]')
paaa.findall(a)

I obtained

['[abc] [abc] [y78]']

Why the '[abc]' is missing? The '[abc]' clearly matches the pattern as well. Is there any bug in the python3 re.findall function?

Clarification:

Sorry the paaa should be paaa = re.compile(r'\[ab.*\]') What I am looking for is something which will return

['[abc]', '[abc]', '[abc] [abc]', '[abc] [abc] [y78]']

Basically, any substring matches the pattern.

Cœur
  • 37,241
  • 25
  • 195
  • 267
nimning
  • 527
  • 1
  • 5
  • 5

2 Answers2

2

The repeated . in [ab.*] is greedy - it'll match as many characters as it can such that those characters are followed by a ]. So, everything in between the first [ and the last ] are matched.

Use lazy repetition instead, with .*?:

a = r'[abc] [abc] [y78]'
paaa = re.compile(r'\[ab.*?]')
print(paaa.findall(a))
['[abc]', '[abc]']
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
  • Sorry the `paaa` should be `paaa = re.compile(r'\[ab.*\]')` What I am looking for is something which will return ['[abc]', '[abc]', '[abc] [abc]', '[abc] [abc] [y78]'] Basically, any substring matches the pattern. – nimning Aug 10 '18 at 06:17
1

You should escape the right square bracket as well, and use non-greedy repeater *? in your regex:

import re
a = r'[abc] [abc] [y78]'
paaa = re.compile(r'\[ab.*?\]')
print(paaa.findall(a))

This outputs:

['[abc]', '[abc]']
blhsing
  • 91,368
  • 6
  • 71
  • 106
  • "You should escape the right square bracket as well" - no need. – DYZ Aug 02 '18 at 05:11
  • Sorry the `paaa` should be `paaa = re.compile(r'\[ab.*\]')` What I am looking for is something which will return ['[abc]', '[abc]', '[abc] [abc]', '[abc] [abc] [y78]'] Basically, any substring matches the pattern. – nimning Aug 10 '18 at 06:18