re.findall return separate non-overlapping results

Question

I am new to Python and I am struggling a bit with regular expressions. If I have an input like this:

    text = <tag>xyz</tag>\n<tag>abc</tag>

Is it possible to get an output list with elements like:

    matches = ['<tag>xyz</tag>','<tag>abc</tag>]

Right now I am using the following regex

    matches = re.findall(r"<tag>[\w\W]*</tag>", text)

But instead of a list with two elements I am getting only one element with the whole input string like:

    matches = ['<tag>xyz</tag>\n<tag>abc</tag>']

Could someone please guide me? Thank you.

I am new to regex and wasn't really aware of the greedy and non-greedy search. Thank you for linking those answers. But `*?` is returning only the last occurrence. Is there a way to capture all occurrences? — , Dec 03 '18 at 20:30
This is my exact code `matches = re.findall(r"[\w\W]*?", file1)` . Am I doing something wrong that is making it return only the last occurrence? I am trying to capture the data between multiple `` and `` tags — , Dec 03 '18 at 20:59

Pushpesh Kumar Rajwanshi · Accepted Answer · 2018-12-04T04:35:13.980

0

You just need to make your capture non-greedy.

Change this regex,

<tag>[\w\W]*</tag>

to

<tag>[\w\W]*?</tag>


import re
text = '<tag>xyz</tag>\n<tag>abc</tag>'
matches = re.findall(r"<tag>[\w\W]*?</tag>", text)
print(matches)

Prints,

['<tag>xyz</tag>', '<tag>abc</tag>']

edited Dec 04 '18 at 04:35

answered Dec 01 '18 at 21:00

Thank you for your answer. But doing that is giving me only the last occurrence of that pattern instead of all occurrences. Could you please suggest a way to capture all occurrences? – Dec 03 '18 at 20:28
@ApoorvaPatil: Try my python code in my answer. It prints exactly what I have written there. If it's not working for you, can you edit your post and share the code you are trying? – Pushpesh Kumar Rajwanshi Dec 04 '18 at 04:36

1 Answers1