-2

Sample string:

str = "<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)"

Result should be a list:

res = [John, Mary]

I should really have learned regex by now.

SupsH
  • 41
  • 3

2 Answers2

1

Try this:

import re
str = "<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)"
ext = re.findall(r'<sec>(\S+?)</sec>', str)

This will return ['John', 'Mary']

\S - represents match any non-whitespace character.

+? - represents repeat a character one or more time(non-greedy).

() - represents extract everything that is inside of these parenthesis.

Weafs.py
  • 22,731
  • 9
  • 56
  • 78
  • 2
    Note that parsing `xml` format with `re` is highly error-prone. You should use libraries such as `xml` or `lxml` instead. And it's unlikely that OP wanted to search for only 4-letter strings. – qwm Aug 31 '14 at 11:11
0

You are dealing with (something like) XML. Use a parser.

import xml.etree.ElementTree as ET

str = "<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)"

doc = ET.fromstring("<root>" + str + "</root>")
result = [x.text for x in doc.findall(".//sec")]

# >>> ['John', 'Mary']
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • Ah, didn't see this! This also seems to work very nicely! Thanks! – SupsH Sep 01 '14 at 12:31
  • Not only does it work, it's also much more failure-resistant and a lot more flexible. – Tomalak Sep 01 '14 at 12:37
  • I recognize that! I have changed the accepted answer. – SupsH Sep 01 '14 at 12:38
  • I gave your question an up-vote to counter the down-votes, but for next time please show some effort of your own. This is generally received favorably around StackOverflow. – Tomalak Sep 01 '14 at 12:40