-1

I have a string like :

myStr = "abcd123[ 45][12] cd [67]"

I want to fetch all the sub-strings between '[' and ']' markers. I am using findall to fetch the same but all i get is everything between firsr '[' and ']' last character.

print re.findall('\[(.+)\]', myStr)

What wrong am i doing here ?

Aran-Fey
  • 39,665
  • 11
  • 104
  • 149
Dexter
  • 1,299
  • 3
  • 20
  • 38

2 Answers2

3

This will probably be marked as duplicate, but the simple fix here would be to just make your dot lazy:

print re.findall('\[(.+?)\]', myStr)

[' 45', '12', '67']

Here .+? means consume everything until hitting first, or nearest, closing square bracket. Your current pattern is consuming everything until the very last closing square bracket.

Another logically identical pattern which would also work is \[([^\]+)\]:

print re.findall('\[([^\]]+)\]', myStr)
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
1

The .+ is greedy and selects as much it can, including other [] characters.

You have two options: Make the selector non-greedy by using .+? which selects the least number of characters possible, or explicitly exclude [] from your match by using [^\[\]]+ instead of .+.

(Both of these options are about equally good in this case. Though the "non-greedy" option is preferable if your ending delimiter is a longer string instead of a single character, since the longer string is more difficult to exclude.)

Christoph Burschka
  • 4,467
  • 3
  • 16
  • 31
  • 1
    Actually, your second suggestion is probably "better" in the sense that it should work across almost every regex engine, whereas lazy dot may not work in every engine. – Tim Biegeleisen Feb 14 '19 at 10:44