FInd_all in bs4 returns one elment when there is more in the web page in

Question

I am doing web scrapping to a new egg page and i want to scrape the rating of the product by the consumers and i am using this code

page = requests.get('https://www.newegg.com/msi-geforce-rtx-3060-rtx-3060-ventus-2x-12g-oc/p/N82E16814137632?Description=gpu&cm_re=gpu-_-14-137-632-_-Product').text
soup = bs(page , 'lxml')
the_rating = soup.find_all( class_ = 'rating rating-4')
print(the_rating)

And it returns only this one element even though I am using the find all element

[<i class="rating rating-4"></i>]

There's only that one element that has a class attribute containing *both* rating and rating-4 though... — Jon Clements, Jan 17 '23 at 13:17
this is just an example all of the ratings return one elment even the rating rating-5 elment that is the thing that i seem not understand?/ — majduddin alboon, Jan 17 '23 at 13:26
@majduddinalboon I got the recaptha with your code, but, if you manage to get the data, check [this answer](https://stackoverflow.com/a/35465898/12511801) — Marco Aurelio Fernandez Reyes, Jan 17 '23 at 15:44

score 0 · Answer 1 · answered Jan 18 '23 at 04:01

I get [] with your code; judging by the text content, or when I break it print the response status and url

r = requests.get('https://www.newegg.com/msi-geforce-rtx-3060-rtx-3060-ventus-2x-> 12g-oc/p/N82E16814137632?Description=gpu&cm_re=gpu-_-14-137-632-_-Product')
print(f'<{r.status_code} {r.reason}> from {r.url}')
# soup = bs(r.content , 'lxml')

output:

<200 OK> from https://www.newegg.com/areyouahuman?referer=/areyouahuman?referer=https%3A%2F%2Fwww.newegg.com%2Fmsi-geforce-rtx-3060-rtx-3060-ventus-2x-12g-oc%2Fp%2FN82E16814137632%3FDescription%3Dgpu%26cm_re%3Dgpu-_-14-137-632-_-Product&why=8&cm_re=gpu-_-14-137-632-_-Product&Description=gpu

It's been redirected to a CAPTCHA...

Anyway, even if you get past that (I couldn't so I just pasted and parsed the response from my browser's network logs to test), all you can get from page is the source HTML, which does not contain any elements with class="rating rating-4"; using selenium and waiting for the page to finish loading yielded a bit more, but even then there weren't any exact matches.

^{[There were some matches when I inspected in browser, but only if I wasn't in incognito mode, which is likely why selenium didn't find them either.]}

So, the site probably adds or removes some classes based on the source of the request. If you just want to get all elements with both the rating and rating-4 classes (that will include the elements with class="rating is-large rating-4"), you can use .find... with lambda (or define a separate function) or use .select with CSS selectors like

the_rating = soup.select('.rating.rating-4') # shorter than 
# .find_all(lambda t: {'rating', 'rating-4'}.issubset(set(t.get('class', []))))

[Just make sure you have the full/correct HTML.]

FInd_all in bs4 returns one elment when there is more in the web page in

1 Answers1