Reading data from a website

Question

I'm trying to read data from a website that contains only text. I'd like to read only the data that follows "&values". I've been able to open the entire website, but I don't know how to get rid of the extraneous data and I don't know any HTML. Any help would be much appreciated.

`http://www.crummy.com/software/BeautifulSoup/` – pogo Oct 27 '12 at 01:58 — pogo, Oct 27 '12 at 01:58

score 3 · Accepted Answer · answered Oct 27 '12 at 01:59

The contents of that url look like url parameters. You could use urllib.parse_qs to parse them into a dict:

import urllib2
import urlparse

url = 'http://www.tip.it/runescape/gec/price_graph.php?avg=1&start=1327715574&mainitem=10350&item=10350'
response = urllib2.urlopen(url)
content = response.read()
params = urlparse.parse_qs(content)
print(params['values'])

score 2 · Answer 2 · edited May 23 '17 at 12:23

2

You may want to look into the re module (although if you do eventually move to HTML, regex is not the best solution). Here is a basic example that grabs the text after &values and returns the following number/comma/space combinations:

>>> import re
>>> import urllib2
>>> url = 'http://www.tip.it/runescape/gec/price_graph.php?avg=1&start=1327715574&mainitem=10350&item=10350'
>>> contents = urllib2.urlopen(url).read()
>>> values = re.findall(r'&values=([\d,\s]*)', contents)
>>> values[0].split(',')
['33900000', '33900000', '33900000', #continues....]

edited May 23 '17 at 12:23

Community

1
1

answered Oct 27 '12 at 01:57

RocketDonkey

36,383
7
80
84

user1709173 Happy it helps! However @unutbu's answer is pretty clever :) – RocketDonkey Oct 27 '12 at 02:00

Reading data from a website

2 Answers2