0

I am new to Python and I am struggling a bit with regular expressions. If I have an input like this:

    text = <tag>xyz</tag>\n<tag>abc</tag>

Is it possible to get an output list with elements like:

    matches = ['<tag>xyz</tag>','<tag>abc</tag>]

Right now I am using the following regex

    matches = re.findall(r"<tag>[\w\W]*</tag>", text)

But instead of a list with two elements I am getting only one element with the whole input string like:

    matches = ['<tag>xyz</tag>\n<tag>abc</tag>']

Could someone please guide me? Thank you.

petezurich
  • 9,280
  • 9
  • 43
  • 57
  • Use lazy (or non-greedy) quantifier, replace `*` with `*?`. – Wiktor Stribiżew Dec 01 '18 at 21:06
  • I am new to regex and wasn't really aware of the greedy and non-greedy search. Thank you for linking those answers. But `*?` is returning only the last occurrence. Is there a way to capture all occurrences? –  Dec 03 '18 at 20:30
  • It returns all occurrences. – Wiktor Stribiżew Dec 03 '18 at 20:43
  • This is my exact code `matches = re.findall(r"[\w\W]*?", file1)` . Am I doing something wrong that is making it return only the last occurrence? I am trying to capture the data between multiple `` and `` tags –  Dec 03 '18 at 20:59

1 Answers1

0

You just need to make your capture non-greedy.

Change this regex,

<tag>[\w\W]*</tag>

to

<tag>[\w\W]*?</tag>


import re
text = '<tag>xyz</tag>\n<tag>abc</tag>'
matches = re.findall(r"<tag>[\w\W]*?</tag>", text)
print(matches)

Prints,

['<tag>xyz</tag>', '<tag>abc</tag>']
Pushpesh Kumar Rajwanshi
  • 18,127
  • 2
  • 19
  • 36
  • Thank you for your answer. But doing that is giving me only the last occurrence of that pattern instead of all occurrences. Could you please suggest a way to capture all occurrences? –  Dec 03 '18 at 20:28
  • @ApoorvaPatil: Try my python code in my answer. It prints exactly what I have written there. If it's not working for you, can you edit your post and share the code you are trying? – Pushpesh Kumar Rajwanshi Dec 04 '18 at 04:36