Finding the second of the same two words in a line

Question

I am using line.rfind() to find a certain line in an html page and then I am splitting the line to pull out individual numbers. For example:

position1 = line.rfind('Wed')

This finds this particular line of html code:

 <strong class="temp">79<span>&deg;</span></strong><span class="low"><span>Lo</span> 56<span>&deg;</span></span>

First I want to pull out the '79', which is done with the following code:

if position1 > 0 :
        self.high0 = lines[line_number + 4].split('<span>')[0].split('">')[-1]

This works perfectly. The problem I am encountering is trying to extract the '56' from that line of html code. I can't split it between '< span>' and '< /span> since the first '< span>' it finds in the line is after the '79'. Is there a way to tell the script to look for the second occurrence of '< span>'?

Thanks for your help!

score 2 · Accepted Answer · edited May 23 '17 at 12:28

2

Concerns about parsing HTML with regex aside, I've found that regex tends to be fairly useful for grabbing information from limited, machine-generated HTML.

You can pull out both values with a regex like this:

import re
matches = re.findall(r'<strong class="temp">(\d+).*?<span>Lo</span> (\d+)', lines[line_number+4])
if matches:
    high, low = matches[0]

Consider this quick-and-dirty: if you rely on it for a job, you may want to use a real parser like BeautifulSoup.

edited May 23 '17 at 12:28

Community

1
1

answered Sep 11 '13 at 03:54

nneonneo

171,345
36
312
383

Awesome. Thank you. This is just for my own purposes, nothing important. Though I may check out BeautifulSoup anyway. Thanks again. – hunter21188 Sep 11 '13 at 04:13

7stud · Answer 2 · 2013-09-11T16:48:55.603

1

import re

html = """
 <strong class="temp">79<span>&deg;</span></strong><span class="low"><span>Lo</span> 56<span>&deg;</span></span>
"""

numbers = re.findall(r"\d+", html, re.X|re.M|re.S)
print numbers

--output:--
['79', '56']

With BeautifulSoup:

from bs4 import BeautifulSoup

html = """
<strong class="temp">
    79
    <span>&deg;</span>
</strong>
<span class="low">
   <span>Lo</span> 
   56
   <span>&deg;</span>
</span>
"""

soup = BeautifulSoup(html)
low_span = soup.find('span', class_="low")

for string in low_span.stripped_strings:
    print string

--output:--
Lo
56
°

edited Sep 11 '13 at 16:48

answered Sep 11 '13 at 03:54

7stud

46,922
14
101
127

Thanks 7stud. This will be helpful if I decide to use BeautifulSoup. – hunter21188 Sep 11 '13 at 05:01

Finding the second of the same two words in a line

2 Answers2