Scraping Amazon products names

Question

I am trying to gather the first two pages products names on Amazon based on seller name. When I request the page, it has all elements I need ,however, when I use BeautifulSoup - they are not being listed. Here is my code:

import requests
from bs4 import BeautifulSoup
headers = {'User-Agent':'Mozilla/5.0'}
res = requests.get("https://www.amazon.com/s?me=A3WE363L17WQR&marketplaceID=ATVPDKIKX0DER", headers=headers)
#print(res.text)
soup = BeautifulSoup(res.text, "html.parser")
soup.find_all("a",href=True)

The links of products are not listed. If the Amazon API gives this information, I am open to use it (please provide some examples of its usage). Thanks a lot in advance.

Can you provide some detail on what you are seeing and what you expected to see? — Eliot K, Mar 20 '19 at 19:57
they may be dynamically loaded and require a method like selenium — QHarr, Mar 20 '19 at 20:55
@QHarr that is what i thought at the beginning but they are present in in `res.text` which is weird! however not present in soup — hadesfv, Mar 20 '19 at 21:06
@EliotK what i want is to get the products titles(names) which are present in `res.text` as stated in question but not in soup — hadesfv, Mar 20 '19 at 21:08
E.g. would be Dr. Elsey's Cat Ultra Premium Clumping Cat Litter (Pack May Vary) ? — QHarr, Mar 20 '19 at 21:08

score 0 · Accepted Answer · answered Mar 20 '19 at 21:13

I have extracted product names from alt attribute. Is this as intended?

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.amazon.com/s?me=A3WE363L17WQR&marketplaceID=ATVPDKIKX0DER')
soup = bs(r.content, 'lxml')
items = [item['alt'] for item in soup.select('.a-link-normal [alt]')]
print(items)

Over two pages:

import requests
from bs4 import BeautifulSoup as bs
url = 'https://www.amazon.com/s?i=merchant-items&me=A3WE363L17WQR&page={}&marketplaceID=ATVPDKIKX0DER&qid=1553116056&ref=sr_pg_{}'
for page in range(1,3):
    r = requests.get(url.format(page,page))
    soup = bs(r.content, 'lxml')
    items = [item['alt'] for item in soup.select('.a-link-normal [alt]')]
    print(items)

i have solved this problem using `lxml.html' module but i wonder why soup(html) couldn't read it — hadesfv, Mar 20 '19 at 21:32
so the above worked as expected? https://stackoverflow.com/questions/25714417/beautiful-soup-and-table-scraping-lxml-vs-html-parser lxml I think is possibly better with poorly formed html. — QHarr, Mar 20 '19 at 21:33

Scraping Amazon products names

1 Answers1