-1

 I have something confuse about the re module.
 Supose I have the following text:

<grp>    
  <i>i1</i>    
  <i>i2</i>    
  <i>i3</i>    
  ...    
</grp>    

 I use the following re to extract the <i></i> part of the text:

>>> t = "<grp>      <i>i1</i>      <i>i2</i>      <i>i3</i>      ...    </grp>"
>>> import re
>>> re.match("<grp>.*(<i>.*?</i>).*</grp>", t).group(1)
'<i>i3</i>'
>>>

 I only get the last match items.

 My question is how can extract all the match items using only reg expression? for example: extract <i>i1</i> <i>i2</i> <i>i3</i> in a list ['<i>i1</i>', '<i>i2</i>', '<i>i3</i>']

  Thanks a lot!

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
buf1024
  • 31
  • 6
  • Why can't you use two regular expressions for this specific case? There's not much point in having regular expressions that are too large to handle for yourself unless you need them for performance. Anyway, obligatory reading: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Qantas 94 Heavy Jul 02 '14 at 02:48
  • I fixed it, sorry to bother all. – buf1024 Jul 02 '14 at 02:51

2 Answers2

2

You can easily do that using re.findall():

import re
result = re.findall("<i>.*?</i>", t)

>>> print result
['<i>i1</i>', '<i>i2</i>', '<i>i3</i>']
sshashank124
  • 31,495
  • 9
  • 67
  • 76
2

Why don't use an XML parser, like xml.etree.ElementTree from Python standard library:

import xml.etree.ElementTree as ET

data = """
<grp>
  <i>i1</i>
  <i>i2</i>
  <i>i3</i>
</grp>
"""

tree = ET.fromstring(data)
results = tree.findall('.//i')
print [ET.tostring(el).strip() for el in results]
print [el.text for el in results]  # if you need just text inside the tags

Prints:

['<i>i1</i>', '<i>i2</i>', '<i>i3</i>']
['i1', 'i2', 'i3']
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • @hwnd thanks, this is what I feel about it too - using specialized tools for specialized tasks, batteries are there, just import them :) – alecxe Jul 02 '14 at 03:10