Match everything inside multiple instances of a tag in a string in python

Question

Sample string:

str = "<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)"

Result should be a list:

res = [John, Mary]

I should really have learned regex by now.

http://stackoverflow.com/questions/7361253/python-how-to-find-a-substring-in-another-string or you should really have learned to google ;) — sharpshadow, Aug 31 '14 at 10:50

Weafs.py · Answer 1 · 2014-09-13T18:35:20.543

1

Try this:

import re
str = "<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)"
ext = re.findall(r'<sec>(\S+?)</sec>', str)

This will return ['John', 'Mary']

\S - represents match any non-whitespace character.

+? - represents repeat a character one or more time(non-greedy).

() - represents extract everything that is inside of these parenthesis.

edited Sep 13 '14 at 18:35

answered Aug 31 '14 at 10:54

Weafs.py

22,731
9
56
78

2

Note that parsing `xml` format with `re` is highly error-prone. You should use libraries such as `xml` or `lxml` instead. And it's unlikely that OP wanted to search for only 4-letter strings. – qwm Aug 31 '14 at 11:11

score 0 · Accepted Answer · answered Aug 31 '14 at 11:35

0

You are dealing with (something like) XML. Use a parser.

import xml.etree.ElementTree as ET

str = "<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)"

doc = ET.fromstring("<root>" + str + "</root>")
result = [x.text for x in doc.findall(".//sec")]

# >>> ['John', 'Mary']

answered Aug 31 '14 at 11:35

Tomalak

332,285
67
532
628

Ah, didn't see this! This also seems to work very nicely! Thanks! – SupsH Sep 01 '14 at 12:31
Not only does it work, it's also much more failure-resistant and a lot more flexible. – Tomalak Sep 01 '14 at 12:37
I recognize that! I have changed the accepted answer. – SupsH Sep 01 '14 at 12:38
I gave your question an up-vote to counter the down-votes, but for next time please show some effort of your own. This is generally received favorably around StackOverflow. – Tomalak Sep 01 '14 at 12:40

Match everything inside multiple instances of a tag in a string in python

2 Answers2