Python code for searching through a url

Question

Hey so im trying to get the current oil price and then do some math on it for a hw assignment. Im having trouble getting it to find the numbers i need on the website. here is my code

    # Module oilcost.py to compute the delivery cost for home heating oil.
# Assume your delivery company charges a 10% fee on top of the price 
# per gallon.  The module should take one command line argument 
# indicating the number of gallons needed and should output the 
# total cost.

import sys
import re
import urllib



def getOilPrice(url):
    f = urllib.urlopen(url)
    html=f.read()
    f.close()
    match = re.search(r'<span class="dailyPrice">( d+.? d+)</span>', html)
    return match.group(1) if match else '0'

def outputPrice(oilprice, gallons, total):
    print 'The current oil price is $ %s' %oilprice


def main():
    url = 'http://www.indexmundi.com/commodities/?commodity=heating-oil'
    oilprice = float(getOilPrice(url))     # Create this method
    gallons = float(sys.argv[1])                      # Get from command line
    total = (gallons * 1.1) * oilprice
    outputPrice(oilprice, gallons, total)  # Create this method
if __name__ == '__main__':
    main()

can anyone let me know what im doing wrong?

my output after i get it working would be print 'The current oil price is $ %s \nThe total price of %s gallons is $ %s.' % (oilprice,gallons,total) — matture, Mar 02 '12 at 19:08
I think what Ignacio is saying is that you aren't having a problem with the url, but rather, the *HTML resource* at that address. specifically, you're trying to deal with HTML, which is a different beast entirely from URL's. It's immaterial that you happen to get that HTML by downloading it at some url. It could just as easily have been a file on disk or a literal string in your python script and you'd have the same problem. — SingleNegationElimination, Mar 02 '12 at 19:16

score 2 · Accepted Answer · edited May 23 '17 at 12:21

2

Parsing html is notiorusly fraught with peril; but for the purposes of homework, that might not be so important; This is a pretty good chance to learn about regular expressions.

on the line:

match = re.search(r'<span class="dailyPrice">( d+.? d+)</span>', html)
#                                              ^    ^

you have some d's, which will match the literal letter d. could you have possibly meant \d (that's a backslash)?

edited May 23 '17 at 12:21

Community

1
1

answered Mar 02 '12 at 19:14

SingleNegationElimination

151,563
33
264
304

And, do you really want the space before the first digit, and the digits after the (optional) decimal point? – Pierce Mar 02 '12 at 19:16

score 1 · Answer 2 · answered Mar 02 '12 at 19:15

1

Your regex doesn't match the page content. You have:

( d+.? d+)

But the page has:

3.23

Your regex matches: a space, followed by one or more d characters, followed by an any optional character, followed by a space, followed by one or more d characters. This might work better:

(\d+(\.\d+)?)

Which is: one or more digits, followed by an optional group consisting of a literal . character and one or more digits.

answered Mar 02 '12 at 19:15

beerbajay

19,652
6
58
75

i wish i could accept both your answers, but tokenmacguy was first. Thanks for your help however! – matture Mar 02 '12 at 19:25

Python code for searching through a url

2 Answers2