Beautifulsoup unable to extract data using attrs=class

Question

I am extracting data for a research project and I have sucessfully used findAll('div', attrs={'class':'someClassName'}) in many websites but this particular website,

WebSite Link

doesn't return any values when I used attrs option. But when I don't use the attrs option I get entire html dom.

Here is the simple code that I started with to test it out:

soup = bs(urlopen(url))
for div in soup.findAll('div', attrs={'class':'data'}):
    print div

If retrieved via urllib2, there is no element with class "data". — Ansari, Jul 30 '12 at 00:21
I have used it to other website and it works fine with urllib2 — add-semi-colons, Jul 30 '12 at 01:40

score 2 · Accepted Answer · answered Jul 30 '12 at 00:24

2

My code is working fine, with requests

import requests
from BeautifulSoup import BeautifulSoup as bs
#grab HTML
r = requests.get(r'http://www.amazon.com/s/ref=sr_pg_1?rh=n:172282,k%3adigital%20camera&keywords=digital%20camera&ie=UTF8&qid=1343600585')
html = r.text
#parse the HTML
soup = bs(html)

results= soup.findAll('div', attrs={'class': 'data'})

print results

answered Jul 30 '12 at 00:24

TankorSmash

12,186
6
68
106

instead of request i want to use urlopen, and strange enough with urlopen I am unable to get the dom. – add-semi-colons Jul 30 '12 at 01:33
Sorry, I don't know anything about `urlopen`. Why not use requests instead? http://docs.python-requests.org/en/latest/index.html – TankorSmash Jul 30 '12 at 01:55

score 2 · Answer 2 · answered Jul 27 '14 at 03:17

If you or anyone reading this question would like to know the reason that the code wasn't able to find the attrs value using the code you've given (copied below):

soup = bs(urlopen(url))
for div in soup.findAll('div', attrs={'class':'data'}):
    print div

The issue is when you attempted to create a BeautifulSoup object soup = bs(urlopen(url)) as the value of urlopen(url) is a response object and not the DOM.

I'm sure any issues you had encountered could have been more easily resolved by using bs(urlopen(url).read()) instead.

Beautifulsoup unable to extract data using attrs=class

2 Answers2

Linked