I'm trying to read data from a website that contains only text. I'd like to read only the data that follows "&values". I've been able to open the entire website, but I don't know how to get rid of the extraneous data and I don't know any HTML. Any help would be much appreciated.
Asked
Active
Viewed 242 times
1
-
`http://www.crummy.com/software/BeautifulSoup/` – pogo Oct 27 '12 at 01:58
2 Answers
3
The contents of that url look like url parameters. You could use urllib.parse_qs
to parse them into a dict:
import urllib2
import urlparse
url = 'http://www.tip.it/runescape/gec/price_graph.php?avg=1&start=1327715574&mainitem=10350&item=10350'
response = urllib2.urlopen(url)
content = response.read()
params = urlparse.parse_qs(content)
print(params['values'])

unutbu
- 842,883
- 184
- 1,785
- 1,677
2
You may want to look into the re
module (although if you do eventually move to HTML, regex is not the best solution). Here is a basic example that grabs the text after &values
and returns the following number/comma/space combinations:
>>> import re
>>> import urllib2
>>> url = 'http://www.tip.it/runescape/gec/price_graph.php?avg=1&start=1327715574&mainitem=10350&item=10350'
>>> contents = urllib2.urlopen(url).read()
>>> values = re.findall(r'&values=([\d,\s]*)', contents)
>>> values[0].split(',')
['33900000', '33900000', '33900000', #continues....]

Community
- 1
- 1

RocketDonkey
- 36,383
- 7
- 80
- 84
-
user1709173 Happy it helps! However @unutbu's answer is pretty clever :) – RocketDonkey Oct 27 '12 at 02:00