-1

I have a HTML file

 ...<b>Breakfast</b><hr>...

I want Breakfast which is between > and <.

I tried

...for test_string in line:
        if re.match(r'(>.*<$)',test_string):...

That didn't give >Breakfast< either.

Thank you.

Vinayak Garg
  • 6,518
  • 10
  • 53
  • 80
He Drunh
  • 11
  • 1
  • 2
  • Why did you include `$`? – Cameron Jan 22 '12 at 06:45
  • possible duplicate of [Whats the regular expression for finding string between " "](http://stackoverflow.com/questions/3066328/whats-the-regular-expression-for-finding-string-between) – Ken White Jan 22 '12 at 06:46
  • 1
    You should look at something like this: http://www.crummy.com/software/BeautifulSoup/ – Chris Cooper Jan 22 '12 at 06:46
  • 7
    As usual, with anything involving HTML and Regexes: http://stackoverflow.com/a/1732454/118068 – Marc B Jan 22 '12 at 06:51
  • possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – outis Jan 22 '12 at 22:44

3 Answers3

4

In general regular expression can't parse html. You could use an html parser instead:

from BeautifulSoup import BeautifulSoup # pip install BeautifulSoup

html = """...<b>Breakfast</b><hr>..."""

soup = BeautifulSoup(html)
print soup(text=True) # get all text
# -> [u'...', u'Breakfast', u'...']
print [b.text for b in soup('b')] # get all text for <b> tags
# -> [u'Breakfast']
jfs
  • 399,953
  • 195
  • 994
  • 1,670
3

The $ means "end of input" and doesn't belong in this regex.

Instead, do the following:

m = re.search(r'>([^<]*)<', test_string)
if m:
    print m.group(1)

This searches for >, then all the following characters that are not <, and then <. The characters betweens > and < are marked as a group, which you get using m.group(1)

dorsh
  • 23,750
  • 2
  • 27
  • 29
0

I think you want:

r'(>.*?<)'

Or maybe

r'<b(>.*?<)/b>'

which is non-greedy and matches in the middle of a string. Note that parsing HTML with regular expressions is not very robust.

Community
  • 1
  • 1
Cameron
  • 96,106
  • 25
  • 196
  • 225