Sample string:
str = "<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)"
Result should be a list:
res = [John, Mary]
I should really have learned regex by now.
Sample string:
str = "<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)"
Result should be a list:
res = [John, Mary]
I should really have learned regex by now.
Try this:
import re
str = "<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)"
ext = re.findall(r'<sec>(\S+?)</sec>', str)
This will return ['John', 'Mary']
\S
- represents match any non-whitespace character.
+?
- represents repeat a character one or more time(non-greedy).
()
- represents extract everything that is inside of these parenthesis.
You are dealing with (something like) XML. Use a parser.
import xml.etree.ElementTree as ET
str = "<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)"
doc = ET.fromstring("<root>" + str + "</root>")
result = [x.text for x in doc.findall(".//sec")]
# >>> ['John', 'Mary']