2

I'm new to python and I have this string:

  row =  <aa>hello</aa><bb>bello</bb><aa>great</aa><cc>today</cc><aa>later</aa><bb>fine</bb>

I need to get all data that is in aa :

hello,great,later

my code is:

 allAA  =[]
 patternAA = "<aa>(.*)</aa>"
 allAA = '\''+(re.search(patternAA, str(row))).groups() +'\','

and I get this result = <aa>hello</aa><bb>bello</bb><aa>great</aa><cc>today</cc><aa>later</aa> How can I get the data I need?

Damkulul
  • 1,406
  • 2
  • 25
  • 59

2 Answers2

0

You can use a .findall() method that lists all matches for your regex expression

import re

row =  "<aa>hello</aa><bb>bello</bb><aa>great</aa><cc>today</cc><aa>later</aa><bb>fine</bb>"
allAA = re.findall(r'<aa>(.*?)</aa>', row)

print(allAA) # ['hello', 'great', 'later']
Nace Kapus
  • 71
  • 6
0

There are two issues with your code:

  1. You need to use a non-greedy capture group, specified by using ?.
  2. You should use re.findall() to get the captured, groups, rather than re.search().

With these two fixes, we get the following:

import re
row =  "<aa>hello</aa><bb>bello</bb><aa>great</aa><cc>today</cc><aa>later</aa><bb>fine</bb>"
patternAA = re.compile(r"<aa>(.*?)</aa>")
result = re.findall(patternAA, row)

# Prints ['hello', 'great', 'later']
print(result)
BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33