I'm working on a project using Python(3.7) in which I need to scrape the first few Google results for Titles and Urls, I have tried it using BeautifulSoup but it doesn't work:
Here's what I have tried:
import requests
from my_fake_useragent import UserAgent
from bs4 import BeautifulSoup
ua = UserAgent()
google_url = "https://www.google.com/search?q=python" + "&num=" + str(5)
response = requests.get(google_url, {"User-Agent": ua.random})
soup = BeautifulSoup(response.text, "html.parser")
result_div = soup.find_all('div', attrs={'class': 'g'})
links = []
titles = []
descriptions = []
for r in result_div:
# Checks if each element is present, else, raise exception
try:
link = r.find('a', href=True)
title = r.find('h3', attrs={'class': 'r'}).get_text()
description = r.find('span', attrs={'class': 'st'}).get_text()
# Check to make sure everything is present before appending
if link != '' and title != '' and description != '':
links.append(link['href'])
titles.append(title)
descriptions.append(description)
# Next loop if one element is not present
except:
continue
print(titles)
But it doesn't return anything.
When I try to fetch the HTML
like this:
url = 'https://google.com/search?q=python'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
print(soup.prettify())
here's what it return: (Added a sample returned HTML code)
<div id="main">
<div class="ZINbbc xpd O9g5cc uUPGi">
<div>
<div class="jfp3ef">
<a href="/url?q=https://www.python.org/&sa=U&ved=2ahUKEwiCrK7AvsXiAhWxq1kKHTknCuoQFjAAegQIBxAB&usg=AOvVaw0nCy-teBd7nOrThY5YGQ4o">
<div class="BNeawe vvjwJb AP7Wnd">
Python.org
</div>
<div class="BNeawe UPmit AP7Wnd">
https://www.python.org
</div>
</a>
</div>
<div class="NJM3tb">
</div>
<div class="jfp3ef">
<div>
<div class="BNeawe s3v9rd AP7Wnd">
<div>
<div>
<div class="Ap5OSd">
<div class="BNeawe s3v9rd AP7Wnd">
The official home of the Python Programming Language.
</div>
</div>
<div class="v9i61e">
<div class="BNeawe s3v9rd AP7Wnd">
<span class="BNeawe">
<a href="/url?q=https://www.python.org/downloads/&sa=U&ved=2ahUKEwiCrK7AvsXiAhWxq1kKHTknCuoQjBAwAXoECAcQAw&usg=AOvVaw0TKe6ApGOQcWuHcXIkvAT0">
<span class="XLloXe AP7Wnd">
Download Python
</span>
</a>
</span>
</div>
</div>
<div class="v9i61e">
<div class="BNeawe s3v9rd AP7Wnd">
<span class="BNeawe">
<a href="/url?q=https://www.python.org/about/gettingstarted/&sa=U&ved=2ahUKEwiCrK7AvsXiAhWxq1kKHTknCuoQjBAwAnoECAcQBQ&usg=AOvVaw03o9Qt-KFSbwECm8-wmUZS">
<span class="XLloXe AP7Wnd">
Python For Beginners
</span>
</a>
</span>
</div>
</div>
<div class="v9i61e">
<div class="BNeawe s3v9rd AP7Wnd">
<span class="BNeawe">
<a href="/url?q=https://www.python.org/doc/&sa=U&ved=2ahUKEwiCrK7AvsXiAhWxq1kKHTknCuoQjBAwA3oECAcQBw&usg=AOvVaw3Yz3mO8HXGJoaf35qhyb3V">
<span class="XLloXe AP7Wnd">
Documentation
</span>
</a>
</span>
</div>
</div>
<div class="v9i61e">
<div class="BNeawe s3v9rd AP7Wnd">
<span class="BNeawe">
<a href="/url?q=https://docs.python.org/&sa=U&ved=2ahUKEwiCrK7AvsXiAhWxq1kKHTknCuoQjBAwBHoECAcQCQ&usg=AOvVaw0nY6NyZm0wErJJ1RIgTiPm">
<span class="XLloXe AP7Wnd">
Python Docs
</span>
</a>
</span>
</div>
</div>
<div class="v9i61e">
<div class="BNeawe s3v9rd AP7Wnd">
<span class="BNeawe">
<a href="/url?q=https://www.python.org/psf/&sa=U&ved=2ahUKEwiCrK7AvsXiAhWxq1kKHTknCuoQjBAwBXoECAcQCw&usg=AOvVaw3HoEDHmdRBcufXuwakPCAz">
<span class="XLloXe AP7Wnd">
Python Software Foundation
</span>
</a>
</span>
</div>
</div>
<div>
<div class="BNeawe s3v9rd AP7Wnd">
<span class="BNeawe">
<a href="/url?q=https://www.python.org/downloads/release/python-373/&sa=U&ved=2ahUKEwiCrK7AvsXiAhWxq1kKHTknCuoQjBAwBnoECAcQDQ&usg=AOvVaw3HsJpvpsCvYikd_mP7ndN3">
<span class="XLloXe AP7Wnd">
Python 3.7.3
</span>
</a>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>