How to open links in Amazon using BeautifulSoup4?

Question

Issue: the “linkElems” list appears to be empty

Suspect what is causing the issue: I think the tags I’m telling it to grab is wrong

Function of Program:

Search Amazon.com for arguments in command line and download the website into the variable “res”
Select the URLs for the links of the results of the search and store them into a list called “linkElems”
Open new browser tabs for the first 5 results

Context: I’ve finished Chapter 11 of Automate the Boring stuff and and using the same code from the first project except I’ve tweaked it a little bit to search Amazon search results instead of google.

What Tags I’ve tried:

'a'
‘h2. a’
'a.a-link-normal a-text-normal'
'.h2 a'

#! python3
#Shop on Amazon - searchs amazon and opens the first 5 top results

import sys,requests,bs4,webbrowser,logging

print ('Searching')

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'
}

res = requests.get('https://www.amazon.com/s?k=' + ''.join(sys.argv[1:]))
res.raise_for_status

soup = bs4.BeautifulSoup(res.text,features = 'html.parser')

linkElems = soup.select('a.a-link-normal a-text-normal')
numOpen = min(5, len(linkElems))
for i in range(numOpen):
    webbrowser.open('https://amazon.com' + linkElems[i].get('href'))

HTML example of link I'm trying to grab using the tags:

Sample HTML That I'm searching

Example of me running the program and its output

Check out the answer on this post ! It may help. The problem is similar. https://stackoverflow.com/questions/11465555/can-we-use-xpath-with-beautifulsoup — francovici, Apr 28 '19 at 23:35
@francovici Thanks. I checked out that link but it doesn't quite solve my problem.In the previous project of the book it shows how the links from a google search are siblings of a "div r" class so i tried to do something similar with the Amazon search but it doesn't appear to be grabbing them so i figure I'm not understanding how to tell which tag to use. — bholwerda, Apr 29 '19 at 01:03

isaactfa · Answer 1 · 2019-04-30T02:20:29.603

1

Your problem is your css selector 'a.a-link-normal a-text-normal'. This will look for a a-text-normal tag inside an a tag with class a-link-normal.

a-link-normal and a-text-normal are both classes of the relevant a tag. You can express this in a css selector by chaining them like this: 'a.a-link-normal.a-text-normal'. This denotes that you are looking for an a tag that has both class a-link-normal and a-text-normal.

This script for example will search amazon for your command line input, collect all the links (links = soup.select('a.a-link-normal.a-text-normal')) and then print out the href attribute for each link it found. At this point, all I can say is, it works on my machine.

from bs4 import BeautifulSoup
import requests
from sys import argv


r = requests.get("https://www.amazon.com/s?k=" + '+'.join(argv[1:]))
r.raise_for_status()

soup = BeautifulSoup(r.content, 'html.parser')
links = soup.select('a.a-link-normal.a-text-normal')

for tag in links:
    print(tag.attrs['href'])

edited Apr 30 '19 at 02:20

answered Apr 29 '19 at 01:41

isaactfa

5,461
1
10
24

Now that you mention it, the selector does look like it wasn't gonna work the way it was expressed. Looks like the space should be a period, just like you mentioned. ( a.a-link-normal a-text-normal -> a.a-link-normal.a-text-normal ) – francovici Apr 29 '19 at 03:15
@isaactfa I see what you're saying. I took some time to try it and experiment but i still can't get it. I'll edit my post to include one of the pieces of HTML i'm attempting to grab using the tag. – bholwerda Apr 29 '19 at 21:09
I have amended an example script to my answer. If this doesn't do it for you, there may be something wrong with the way you're getting your HTML. – isaactfa Apr 30 '19 at 02:21
interesting. I copy and pasted your code into a test program and it still doesn't do anything. I will continue to experiment and research. i'll leave this open in case i find the answer so that the information can be out there. – bholwerda Apr 30 '19 at 03:04
Are you getting an error? Are you getting nothing? What input are you giving? – isaactfa Apr 30 '19 at 03:05
I don't get an error, just no output when I ran your version of the code. In command line I type in "SoA cup" (SoA is the name of my program and cup is what I'm searching for) I have it print "Searching" in there so that I know it ran. I'm adding a screenshot so you can see it. You'll see I ran your version that i named "test". – bholwerda Apr 30 '19 at 04:04
This is confusing to me. Why aren't you running something like `> python test.py cup`? – isaactfa Apr 30 '19 at 04:09
This is how the book "Automate the Boring Stuff" tells me how to run it. It uses a .bat file to run it from command line. You can see that it runs. I tried it your way and its the same result. – bholwerda Apr 30 '19 at 04:16
And what's inside the .bat file? My Python code? Because that wouldn't work. – isaactfa Apr 30 '19 at 04:17
Last suggestion: paste my code into a file called `test.py`. From the command line navigate to where `test.py` is located on your system. Once you're in the same directory as `test.py` run `python test.py cup`. – isaactfa Apr 30 '19 at 04:20
'at'py.exe C:\MyPythonScripts\test.py 'at'pause (wouldn't let me use the actual 'at' sign in the comments here) This is just so that i can use "test XXXX" to run the program from a regular command line. – bholwerda Apr 30 '19 at 04:21
Those two first 'at' lines are on separate lines but the comment box messed up the formatting. – bholwerda Apr 30 '19 at 04:22
Okay, just go to `C:\MyPythonScripts` from the command line, and run `python test.py cup`. – isaactfa Apr 30 '19 at 04:23
I did and it's the same result as my version. – bholwerda Apr 30 '19 at 04:24
But it works. I'm literally running it right now and it works exactly as expected. You gotta be doing something wrong. I can't help you. – isaactfa Apr 30 '19 at 04:28
Dang it. This is so frustrating. Thank you for trying to help though. – bholwerda Apr 30 '19 at 04:31
I think I figured out the problem. I printed the raise for status result and I'm getting a 503 error so I think Amazon is blocking my scraper. – bholwerda Apr 30 '19 at 16:50

How to open links in Amazon using BeautifulSoup4?

1 Answers1