-2

Issue: the “linkElems” list appears to be empty

Suspect what is causing the issue: I think the tags I’m telling it to grab is wrong

Function of Program:

  • Search Amazon.com for arguments in command line and download the website into the variable “res”
  • Select the URLs for the links of the results of the search and store them into a list called “linkElems”
  • Open new browser tabs for the first 5 results

Context: I’ve finished Chapter 11 of Automate the Boring stuff and and using the same code from the first project except I’ve tweaked it a little bit to search Amazon search results instead of google.

What Tags I’ve tried:

  • 'a'
  • ‘h2. a’
  • 'a.a-link-normal a-text-normal'
  • '.h2 a'
#! python3
#Shop on Amazon - searchs amazon and opens the first 5 top results

import sys,requests,bs4,webbrowser,logging

print ('Searching')

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'
}

res = requests.get('https://www.amazon.com/s?k=' + ''.join(sys.argv[1:]))
res.raise_for_status

soup = bs4.BeautifulSoup(res.text,features = 'html.parser')

linkElems = soup.select('a.a-link-normal a-text-normal')
numOpen = min(5, len(linkElems))
for i in range(numOpen):
    webbrowser.open('https://amazon.com' + linkElems[i].get('href'))

HTML example of link I'm trying to grab using the tags:

Sample HTML That I'm searching

Example of me running the program and its output

bholwerda
  • 21
  • 5
  • Check out the answer on this post ! It may help. The problem is similar. https://stackoverflow.com/questions/11465555/can-we-use-xpath-with-beautifulsoup – francovici Apr 28 '19 at 23:35
  • @francovici Thanks. I checked out that link but it doesn't quite solve my problem.In the previous project of the book it shows how the links from a google search are siblings of a "div r" class so i tried to do something similar with the Amazon search but it doesn't appear to be grabbing them so i figure I'm not understanding how to tell which tag to use. – bholwerda Apr 29 '19 at 01:03

1 Answers1

1

Your problem is your css selector 'a.a-link-normal a-text-normal'. This will look for a a-text-normal tag inside an a tag with class a-link-normal.

a-link-normal and a-text-normal are both classes of the relevant a tag. You can express this in a css selector by chaining them like this: 'a.a-link-normal.a-text-normal'. This denotes that you are looking for an a tag that has both class a-link-normal and a-text-normal.

This script for example will search amazon for your command line input, collect all the links (links = soup.select('a.a-link-normal.a-text-normal')) and then print out the href attribute for each link it found. At this point, all I can say is, it works on my machine.

from bs4 import BeautifulSoup
import requests
from sys import argv


r = requests.get("https://www.amazon.com/s?k=" + '+'.join(argv[1:]))
r.raise_for_status()

soup = BeautifulSoup(r.content, 'html.parser')
links = soup.select('a.a-link-normal.a-text-normal')

for tag in links:
    print(tag.attrs['href'])
isaactfa
  • 5,461
  • 1
  • 10
  • 24
  • Now that you mention it, the selector does look like it wasn't gonna work the way it was expressed. Looks like the space should be a period, just like you mentioned. ( a.a-link-normal a-text-normal -> a.a-link-normal.a-text-normal ) – francovici Apr 29 '19 at 03:15
  • @isaactfa I see what you're saying. I took some time to try it and experiment but i still can't get it. I'll edit my post to include one of the pieces of HTML i'm attempting to grab using the tag. – bholwerda Apr 29 '19 at 21:09
  • I have amended an example script to my answer. If this doesn't do it for you, there may be something wrong with the way you're getting your HTML. – isaactfa Apr 30 '19 at 02:21
  • interesting. I copy and pasted your code into a test program and it still doesn't do anything. I will continue to experiment and research. i'll leave this open in case i find the answer so that the information can be out there. – bholwerda Apr 30 '19 at 03:04
  • Are you getting an error? Are you getting nothing? What input are you giving? – isaactfa Apr 30 '19 at 03:05
  • I don't get an error, just no output when I ran your version of the code. In command line I type in "SoA cup" (SoA is the name of my program and cup is what I'm searching for) I have it print "Searching" in there so that I know it ran. I'm adding a screenshot so you can see it. You'll see I ran your version that i named "test". – bholwerda Apr 30 '19 at 04:04
  • This is confusing to me. Why aren't you running something like `> python test.py cup`? – isaactfa Apr 30 '19 at 04:09
  • This is how the book "Automate the Boring Stuff" tells me how to run it. It uses a .bat file to run it from command line. You can see that it runs. I tried it your way and its the same result. – bholwerda Apr 30 '19 at 04:16
  • And what's inside the .bat file? My Python code? Because that wouldn't work. – isaactfa Apr 30 '19 at 04:17
  • Last suggestion: paste my code into a file called `test.py`. From the command line navigate to where `test.py` is located on your system. Once you're in the same directory as `test.py` run `python test.py cup`. – isaactfa Apr 30 '19 at 04:20
  • 'at'py.exe C:\MyPythonScripts\test.py 'at'pause (wouldn't let me use the actual 'at' sign in the comments here) This is just so that i can use "test XXXX" to run the program from a regular command line. – bholwerda Apr 30 '19 at 04:21
  • Those two first 'at' lines are on separate lines but the comment box messed up the formatting. – bholwerda Apr 30 '19 at 04:22
  • Okay, just go to `C:\MyPythonScripts` from the command line, and run `python test.py cup`. – isaactfa Apr 30 '19 at 04:23
  • I did and it's the same result as my version. – bholwerda Apr 30 '19 at 04:24
  • But it works. I'm literally running it right now and it works exactly as expected. You gotta be doing something wrong. I can't help you. – isaactfa Apr 30 '19 at 04:28
  • Dang it. This is so frustrating. Thank you for trying to help though. – bholwerda Apr 30 '19 at 04:31
  • I think I figured out the problem. I printed the raise for status result and I'm getting a 503 error so I think Amazon is blocking my scraper. – bholwerda Apr 30 '19 at 16:50