I've got this pattern based on which I want to search a string to find all matches. After using findall()
, only the last one matched is printed.
The string which I want to process is below:
'<inventor sequence="001" designation="us-only"><addressbook><last-name>Li</last-name><first-name>Shuo</first-name><address><city>Beijing</city><country>CN</country></address></addressbook></inventor><inventor sequence="002" designation="us-only"><addressbook><last-name>Liu</last-name><first-name>Xin Peng</first-name><address><city>Beijing</city><country>CN</country></address></addressbook></inventor><inventor sequence="003" designation="us-only"><addressbook><last-name>Sun</last-name><first-name>Sheng Yan</first-name><address><city>Beijing</city><country>CN</country></address></addressbook></inventor><inventor sequence="004" designation="us-only"><addressbook><last-name>Wang</last-name><first-name>Hua</first-name><address><city>Littleton</city><state>MA</state><country>US</country></address></addressbook></inventor><inventor sequence="005" designation="us-only"><addressbook><last-name>Wang</last-name><first-name>Jun</first-name><address><city>Littleton</city><state>MA</state><country>US</country></address></addressbook></inventor>'
I try to use the following code to extract all inventors from the string.
INVENTORS_CONTENT_PATTERN = re.compile('<inventor sequence=".*" designation=".*">(.*?)</inventor>')
re.findall(INVENTORS_CONTENT_PATTERN, data)
The result I get is the last one matched, not all the inventors from data:
['<addressbook><last-name>Wang</last-name><first-name>Jun</first-name><address><city>Littleton</city><state>MA</state><country>US</country></address></addressbook>']