1

I am trying to gather the first two pages products names on Amazon based on seller name. When I request the page, it has all elements I need ,however, when I use BeautifulSoup - they are not being listed. Here is my code:

import requests
from bs4 import BeautifulSoup
headers = {'User-Agent':'Mozilla/5.0'}
res = requests.get("https://www.amazon.com/s?me=A3WE363L17WQR&marketplaceID=ATVPDKIKX0DER", headers=headers)
#print(res.text)
soup = BeautifulSoup(res.text, "html.parser")
soup.find_all("a",href=True)

The links of products are not listed. If the Amazon API gives this information, I am open to use it (please provide some examples of its usage). Thanks a lot in advance.

QHarr
  • 83,427
  • 12
  • 54
  • 101
hadesfv
  • 386
  • 4
  • 18
  • Can you provide some detail on what you are seeing and what you expected to see? – Eliot K Mar 20 '19 at 19:57
  • they may be dynamically loaded and require a method like selenium – QHarr Mar 20 '19 at 20:55
  • @QHarr that is what i thought at the beginning but they are present in in `res.text` which is weird! however not present in soup – hadesfv Mar 20 '19 at 21:06
  • @EliotK what i want is to get the products titles(names) which are present in `res.text` as stated in question but not in soup – hadesfv Mar 20 '19 at 21:08
  • E.g. would be Dr. Elsey's Cat Ultra Premium Clumping Cat Litter (Pack May Vary) ? – QHarr Mar 20 '19 at 21:08

1 Answers1

0

I have extracted product names from alt attribute. Is this as intended?

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.amazon.com/s?me=A3WE363L17WQR&marketplaceID=ATVPDKIKX0DER')
soup = bs(r.content, 'lxml')
items = [item['alt'] for item in soup.select('.a-link-normal [alt]')]
print(items)

Over two pages:

import requests
from bs4 import BeautifulSoup as bs
url = 'https://www.amazon.com/s?i=merchant-items&me=A3WE363L17WQR&page={}&marketplaceID=ATVPDKIKX0DER&qid=1553116056&ref=sr_pg_{}'
for page in range(1,3):
    r = requests.get(url.format(page,page))
    soup = bs(r.content, 'lxml')
    items = [item['alt'] for item in soup.select('.a-link-normal [alt]')]
    print(items)
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • i have solved this problem using `lxml.html' module but i wonder why soup(html) couldn't read it – hadesfv Mar 20 '19 at 21:32
  • so the above worked as expected? https://stackoverflow.com/questions/25714417/beautiful-soup-and-table-scraping-lxml-vs-html-parser lxml I think is possibly better with poorly formed html. – QHarr Mar 20 '19 at 21:33