0

I'm using Python 3 and working with title strings that have a bracketed tag with a pair of names separated by a +. Like this: [John+Alice] A title here.

I've been using the regex expression re.search('\[(.+)\]', title) to get the tag [John+Alice], which is fine, but it's a problem when encountering a title with more than one bracketed tag:

[John+Alice] [Hayley + Serene] Another title.

That gives me [John+Alice] [Hayley + Serene], when I would prefer [John+Alice] and [Hayley + Serene].

How can I modify the regex to give me all bracketed tags that have + between [ and ]? Thanks.

John Smith
  • 125
  • 1
  • 8

1 Answers1

1

You need to make your regex non-greedy, like this:

title = '[John+Alice] [Hayley + Serene] Another title.'

for t in re.findall('\[(.+?)\]', title):
    print(t)

Output

John+Alice
Hayley + Serene

If you must include the brackets use finditer:

for t in re.finditer('\[(.+?)\]', title):
    print(t.group())

Output

[John+Alice]
[Hayley + Serene]

The non-greedy qualifiers such as *?, +?, ?? match as little text as possible. You can find more about greedy vs non-greedy in here.

Observation

In the question you mentioned that you are using '\[(.+)\]' to match all bracketed tags that have + between [ and ], but actually it does a little more than that. For instance, for the following example:

title = '[John+Alice] [Hayley + Serene] [No plus text] Another title.'
re.search('\[(.+)\]', title)

returns:

[John+Alice] [Hayley + Serene] [No plus text]

consequently, my modification (using finditer) gives:

[John+Alice]
[Hayley + Serene]
[No plus text]

Therefore [No plus text] is incorrect, to fix that you should use something like:

title = '[John+Alice] [Hayley + Serene] [No plus text] Another title.'

for t in re.finditer('\[(.+?\+.+?)?\]', title):
    print(t.group())

Output

[John+Alice]
[Hayley + Serene]
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76