BeautifulSoup returns empty brackets

Question

I'm trying to get how many results have a search in Google with bs4 library in python, but while doing it, it returns empty brackets.

Here is my code:

import requests
from bs4 import BeautifulSoup


url_page = 'https://www.google.com/search?q=covid&oq=covid&aqs=chrome.0.0i433l2j0i131i433j0i433j0i131i433l2j0j0i131i433j0i433j0i131i433.691j0j7&sourceid=chrome&ie=UTF-8'

page = requests.get(url_page).text
soup = BeautifulSoup(page, "lxml")

elTexto = soup.find_all(attrs ={'class': 'LHJvCe'})
print(elTexto)

I have an extension in google that check if the html class is correct and it gives me what I'm looking for so I guess that is not the problem.... Maybe is something related with the format of the 'text' I'm trying to get... Thanks!

Google is randomizing class names to prevent just exactly what you're doing. — baduker, Apr 28 '21 at 12:04

score 0 · Answer 1 · answered Apr 28 '21 at 12:12

0

It is better to use gsearch package to accomplish your task, rather than scraping the web page manually.

answered Apr 28 '21 at 12:12

Stefan Tanuwijaya

67
3

score 0 · Answer 2 · answered Aug 26 '21 at 09:16

Google is not randomizing classes as baduker mentioned. They could change some class names over time but they're not randomizing them.

One of the reasons why you get an empty result is because you haven't specified HTTP user-agent aka headers, thus Google might block your request and headers might help to avoid it. You can check what is your user-agent here. Headers will look like this:

headers = {
    'User-agent':
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

requests.get('YOUR URL', headers=headers)

Also, you don't need to use find_all()/findAll() or select() since you're trying to get only one occurrence, not all of them. Use instead:

find('ELEMENT NAME', class_='CLASS NAME')
select_one('.CSS_SELECTORs')

select()/select_one() usually faster.

Code and example in the online IDE (note: the number of results will always differ. It just works this way.):

import requests, lxml
from bs4 import BeautifulSoup

headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {
  "q": "fus ro dah defenition",
  "gl": "us",
  "hl": "en"
}

response = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(response.text, 'lxml')

number_of_results = soup.select_one('#result-stats nobr').previous_sibling
print(number_of_results)

# About 104,000 results

Alternatively, you achieve the same thing using Google Organic Results API from SerpApi, except you don't need to figure out why certain things don't work and instead iterate over structured JSON string and get the data you want.

Code:

import os
from serpapi import GoogleSearch

params = {
    "engine": "google",
    "q": "fus ro dah defenition",
    "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

result = results["search_information"]['total_results']
print(result)

# 104000

Disclaimer, I work for SerpApi.

BeautifulSoup returns empty brackets

2 Answers2