0

I'm looking to scrape headlines from Google's Search Engine. The problem is, when I create a for loop, I get a "TypeError: find() takes no keyword arguments".

Easy enough, when I found the solution, I would simply have to remove the ".text" from the source (code is shown below). But when I do that I get a different error: "TypeError: object of type 'Response' has no len(). I was wondering if there is a workaround to this? The code that I have provided below is with ".text" included. Was wondering if anyone is able to find the solution to this.

from bs4 import BeautifulSoup
import requests

source = requests.get("https://www.google.com/search?q=online+education").text

for soup in BeautifulSoup(source, 'lxml'):
    headline = soup.find("div", class_="BNeawe vvjwJb AP7Wnd")
    print(headline)

I'm expecting to return all ten headlines from Google's search engine results page.

2 Answers2

0

First find all BNeawe vvjwJb AP7Wnd class and then iterate through all results.

import requests
from bs4 import BeautifulSoup

source = requests.get("https://www.google.com/search?q=online+education").text

soup = BeautifulSoup(source, 'lxml')

headlines = soup.find_all("div", class_="BNeawe vvjwJb AP7Wnd")

for headline in headlines:
    print(headline.text, end=', ')

Output:

What is online education | Definition of Online education is ..., Online Education | Encyclopedia.com, 5 Advantages Of Online Learning: Education Without Leaving Home ..., Online Education & Teaching Courses | Harvard University, What is online education? - Lynda.com, What is Online Education? - Online-Education.net, 50 Top Online Learning Sites - Best College Reviews, Benefits of Online Education | Community College of Aurora in ..., 10 Advantages of Taking Online Classes | OEDB.org, Online learning in higher education - Wikipedia, 
Mrugesh Kadia
  • 545
  • 7
  • 19
0

Try to use CSS selectors and select()/select_one() bs4 methods instead. They're typically a bit faster and easier to read, and more flexible.

Make sure you're using user-agent while making a request, otherwise, Google or another website will block your request eventually. What is user-agent I answered here. Check what is your user-agent.

Pass user-agent in request headers:

headers = {
    'User-agent':
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

requests.get(YOUR_URL, headers=headers)

Code and full example in the online IDE:

from bs4 import BeautifulSoup
import requests

headers = {
    'User-agent':
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {
  "q": "how to create minecraft server" # query
}

html = requests.get('https://www.google.com/search', headers=headers, params=params)
soup = BeautifulSoup(html.text, 'lxml')

for result in soup.select('.tF2Cxc'):
  title = result.select_one('.DKV0Md').text
  print(title)

------
'''
How to Setup a Minecraft: Java Edition Server – Home
Download the Minecraft: Java Edition server
Setting Up Your Own Minecraft Server - iD Tech
Tutorials/Setting up a server - Minecraft Wiki
How to make a Minecraft server on Windows, Mac, or Linux
How To Make a Minecraft Server - The Ultimate 2021 Guide
How To Make a Minecraft Server - The Complete Guide - Apex ...
How to Setup a Minecraft Server on Windows 10 - ServerMania
How to Create Your Own Minecraft Gaming Server | OVHcloud
'''

Alternatively, you can achieve the same by using Google Organic Results API from SerpApi. It's a paid API with a free plan.

The core difference is that you don't have to think about solving some problems that might appear along the process, all that needs to be done is just to iterate over structured JSON and get the data you want, rather than making everything from scratch and maintain the parser overtime.

Code to integrate:

import os
from serpapi import GoogleSearch

params = {
    "engine": "google",
    "q": "how to create minecraft server",
    "hl": "en",
    "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

for result in results["organic_results"]:
  title = result['title']
  print(title)

-------
'''
How to Setup a Minecraft: Java Edition Server – Home
Download the Minecraft: Java Edition server
Setting Up Your Own Minecraft Server - iD Tech
Tutorials/Setting up a server - Minecraft Wiki
How to make a Minecraft server on Windows, Mac, or Linux
How To Make a Minecraft Server - The Ultimate 2021 Guide
How To Make a Minecraft Server - The Complete Guide - Apex ...
How to Setup a Minecraft Server on Windows 10 - ServerMania
How to Create Your Own Minecraft Gaming Server | OVHcloud
'''

P.S - I have a dedicated blog about web scraping.

Disclaimer, I work for SerpApi.

Dmitriy Zub
  • 1,398
  • 8
  • 35