How to parse text from a single line of xml by using python

Question

I have a single line of xml and would like to parse all text parts into a list of text.

text = '<string name="status">Finishing <xliff:g id="number">%d</xliff:g> percent.</string>'

My desired output:

desired_output = ['Finishing', '%d', 'percent.']

I used regular expression for this simple task.

import re
pattern = re.compile(r'>.+<')
match = re.findall(pattern, text)

match = ['>Finishing <xliff:g id="number">%d</xliff:g> percent.<']

It seems regular expression failed to get my desired output.

Eliethesaiyan · Accepted Answer · 2017-04-06T03:59:02.617

-1

update your regex to this

 pattern = re.compile(r'. *?>(.+?)<')

if you are working with xml/html parsing you might consider using Beautifulsoup,it will save you a great deal of time to write more regex but if you want to learn regex then it will be by trial and error

edited Apr 06 '17 at 03:59

answered Apr 06 '17 at 03:55

Eliethesaiyan

2,327
1
22
35

Thanks for the advice. I will look into BeautifulSoup – sheperdgirl Apr 06 '17 at 04:11

How to parse text from a single line of xml by using python

1 Answers1