1

I have a single line of xml and would like to parse all text parts into a list of text.

text = '<string name="status">Finishing <xliff:g id="number">%d</xliff:g> percent.</string>'

My desired output:

desired_output = ['Finishing', '%d', 'percent.']

I used regular expression for this simple task.

import re
pattern = re.compile(r'>.+<')
match = re.findall(pattern, text)

match = ['>Finishing <xliff:g id="number">%d</xliff:g> percent.<']

It seems regular expression failed to get my desired output.

1 Answers1

-1

update your regex to this

 pattern = re.compile(r'. *?>(.+?)<')

if you are working with xml/html parsing you might consider using Beautifulsoup,it will save you a great deal of time to write more regex but if you want to learn regex then it will be by trial and error

Eliethesaiyan
  • 2,327
  • 1
  • 22
  • 35