1

I'm using Beautiful Soup to get the results of products from amazon.

Below is my code:

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

amazon_url = "https://www.amazon.co.uk/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=chromebook"

uClient = uReq(amazon_url)
page_html = uClient.read()  
uClient.close()
page_soup = soup(page_html, "html.parser")

containers = page_soup.findAll("div", {"class":"s-item-container"})
print(len(containers))

However, this is printing 16 items when there are 30 items on that page.

Why is that?

Any help would be appreciated.

baduker
  • 19,152
  • 9
  • 33
  • 56
Deps
  • 11
  • 3
  • The html you get as a result of a urllib request is likely going to be different from what you see in your browser (thanks to different header info / cookies etc.). Are you logged into Amazon in your browser? – Garrett Gutierrez Mar 12 '18 at 18:57
  • No i'm not signed in currently – Deps Mar 12 '18 at 19:00
  • If you are using chrome in developer tools check the network tab to find your GET request and see what headers your browser sent. Alternatively you can do a `print(page_soup.prettify())` and check this against the html source of the page in your browser. Also you can save the html and load it up in chrome and see if it looks anything like the actual page you got as a response to the GET. – Garrett Gutierrez Mar 12 '18 at 19:09
  • I'm looking in the developer tools and i'm seeing lots of get request and i'm not too sure what i'm looking for to be honest. I'm also using `print(page_soup.prettify())` and getting lots of javascript and not the html code – Deps Mar 12 '18 at 19:47
  • It seems the other 14 are added with [AJAX](https://stackoverflow.com/q/1510011/9348376). I couldn't personally figure out which requests are adding that to the page, seems amazon has (rightfully so) obfuscated the requests. This is still possible to do, using [selenium](http://selenium-python.readthedocs.io/installation.html#introduction); let me know if you want me to expand on that. – Sean Breckenridge Mar 12 '18 at 22:48
  • Yes please could you expand on that – Deps Mar 13 '18 at 12:18

0 Answers0