My goal is to write a webscraping program in python that parses a google search results page using beautifulsoup and opens several result links at a time. The program looks like this:
#! python3
# searchGoogle.py - Opens several google results.
import requests, sys, webbrowser, bs4
print('Searching...') # display text while downloading the result page
res = requests.get('https://www.google.com/search?q=' + ' '.join(sys.argv[1:]))
res.raise_for_status()
# Retrieve top search result links.
soup = bs4.BeautifulSoup(res.text, 'html.parser')
# Open a browser tab for each result.
linkElems = soup.select('div.yuRUbf > a')
numOpen = min(5, len(linkElems))
for i in range(numOpen):
urlToOpen = linkElems[i].get('href')
print('Opening', urlToOpen)
webbrowser.open(urlToOpen)
Since my HTML-skills are limited, I don't know exactly, how to retrieve the HTML-elements that contain the links.
Here is the web page I want to parse: https://www.google.com/search?q=boring+stuff
My browser's developer console shows the following HTML-code:
All links are in elements with class="yuRUbf" (I have marked one example in the attached picture.)
My question: What is the correct argument, that I have to pass to the soup.select() method? Because all 'a' elements are directly within 'div' elements and those have a class attribute named 'yuRUbf', I thought 'div.yuRUbf > a' is correct...but the program does not work. The web pages are not opened in the browser.
Which experienced HTML developer can help me with this problem? Is my argument that I pass to soup.select() method incorrect? What should it be? Or is the problem somewhere else?
I am using MacOS Catalina and Python 3.8.