0

I am doing web scrapping to a new egg page and i want to scrape the rating of the product by the consumers and i am using this code

page = requests.get('https://www.newegg.com/msi-geforce-rtx-3060-rtx-3060-ventus-2x-12g-oc/p/N82E16814137632?Description=gpu&cm_re=gpu-_-14-137-632-_-Product').text
soup = bs(page , 'lxml')
the_rating = soup.find_all( class_ = 'rating rating-4')
print(the_rating)

And it returns only this one element even though I am using the find all element

[<i class="rating rating-4"></i>]

1 Answers1

0

I get [] with your code; judging by the text content, or when I break it print the response status and url

r = requests.get('https://www.newegg.com/msi-geforce-rtx-3060-rtx-3060-ventus-2x-> 12g-oc/p/N82E16814137632?Description=gpu&cm_re=gpu-_-14-137-632-_-Product')
print(f'<{r.status_code} {r.reason}> from {r.url}')
# soup = bs(r.content , 'lxml')

output:

<200 OK> from https://www.newegg.com/areyouahuman?referer=/areyouahuman?referer=https%3A%2F%2Fwww.newegg.com%2Fmsi-geforce-rtx-3060-rtx-3060-ventus-2x-12g-oc%2Fp%2FN82E16814137632%3FDescription%3Dgpu%26cm_re%3Dgpu-_-14-137-632-_-Product&why=8&cm_re=gpu-_-14-137-632-_-Product&Description=gpu

It's been redirected to a CAPTCHA...


Anyway, even if you get past that (I couldn't so I just pasted and parsed the response from my browser's network logs to test), all you can get from page is the source HTML, which does not contain any elements with class="rating rating-4"; using selenium and waiting for the page to finish loading yielded a bit more, but even then there weren't any exact matches.

[There were some matches when I inspected in browser, but only if I wasn't in incognito mode, which is likely why selenium didn't find them either.]

So, the site probably adds or removes some classes based on the source of the request. If you just want to get all elements with both the rating and rating-4 classes (that will include the elements with class="rating is-large rating-4"), you can use .find... with lambda (or define a separate function) or use .select with CSS selectors like

the_rating = soup.select('.rating.rating-4') # shorter than 
# .find_all(lambda t: {'rating', 'rating-4'}.issubset(set(t.get('class', []))))

[Just make sure you have the full/correct HTML.]

Driftr95
  • 4,572
  • 2
  • 9
  • 21