python web scraping code wont open links

Question

This is from the book "automate the boring stuff with python". At first I made a .bat file and ran it with arguments from cmd, didnt open any pages in chrome, looked up on here, changed up the code, still it executes perfectly and prints the print line but it doesnt open tabs as it should. What am I doing wrong? Thanks in advance

#! python3
# lucky.py opens several google search matches

import requests,sys,webbrowser,bs4
searchTerm1 = 'python'
print('Googling...')
res = requests.get('https://www.google.com/search?={0}'.format(searchTerm1))
res.raise_for_status()

#retrieve top search result links
soup = bs4.BeautifulSoup(res.text,"html.parser")

#open a browser tab for each result.
linkElems = soup.select('.r a')
numOpen = min(5,len(linkElems))
for i in range(numOpen):
    webbrowser.open('http://google.com' + linkElems[i].get('href'))

Try printing `res.text`. Google is likely blocking your requests as it detects that you're not visiting from a browser. — xrisk, Jul 19 '18 at 18:47
It responds with a 302, not that it was blocked entirely. You can retrieve google search results programmatically and without using a browser, though you must do so in the appropriate way. — jonroethke, Jul 19 '18 at 18:55
The query is wrong - try it in your browser. You just need https://www.google.com/search?q={0}. Notice the 'q' in there - that's what you are missing. Your program works fine once you add the 'q'. See my answer below. — Mark, Jul 21 '18 at 18:32

score 0 · Accepted Answer · answered Jul 19 '18 at 19:12

The short answer is that your URL is not returning results. Here's a URL that provides results: https://www.google.com/search?q=python.

I changed the one line in your code to use this template: "https://www.google.com/search?q={0} and I saw linkElems was non-trivial.

jonroethke · Answer 2 · 2018-07-19T19:26:45.293

-1

In short, webbrowser is not opening any pages because numOpen is 0, so the for loop tries to iterate over 0 items, which results in the code within that for loop block (webbrowser.open) to not get executed.

The longer, more detailed explanation of why the numOpen = 0 is due to a redirect that occurs with the initial GET request given your custom Google query. See this answer for how to circumvent these issues as there are numerous ways- the easiest is probably to use the Google search API.

As a result of the redirect, your BeautifulSoup search will not return any successful results, causing the numOpen variable to be set to 0 as there will be no list elements. As there are no list elements, the for loop does not execute.

You can debug things like this on your own the quick and dirty, but not perfect, way by simply adding print statements throughout the script to see which print statements fail to execute as well as looking at the variables and their returned values.

As an aside, the shebag should also be set to #!/usr/bin/env python3 rather than simply #! python3. Reference here.

Hope this helps

edited Jul 19 '18 at 19:26

answered Jul 19 '18 at 18:47

jonroethke

1,152
2
8
16

2

And _why_ is `numOpen` 0? – xrisk Jul 19 '18 at 18:48
Thanks for the quick response! Anyway i tried printing linkElems and its empty so something bugs out there, Im just not sure what? when i printed res.text it printed a whole lot of info and I'm sure it should've grabbed something from there. – BorkoP Jul 19 '18 at 19:04
It all stems from the redirect that occurs with the initial GET request. When he tries to retrieve the elements, there are no elements found that match specifically what he is looking for due to the initial redirect, therefore the `numOpen` variable is set to 0. Therefore the for loop does not execute. Perhaps I could have added this more detailed explanation in the original answer. I was simply trying to answer the question at a high level as to why this wasn't working out of the box. I can edit my original post. – jonroethke Jul 19 '18 at 19:06

python web scraping code wont open links

2 Answers2