5

I am trying to scrape Google results when I search "What is 2+2", but the following code is returning 'NoneType' object has no attribute 'text'. Please help me in achieving the required goal.

text="What is 2+2"
search=text.replace(" ","+")
link="https://www.google.com/search?q="+search
headers={'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'}
source=requests.get(link,headers=headers).text
soup=BeautifulSoup(source,"html.parser")
answer=soup.find('span',id="cwos")

self.respond(answer.text)

The only problem is with id in soup.find, however I have chosen this id very closely. I shouldn't be mistaken. I also tried answer=soup.find('span',class_="cwcot gsrt"), but neither worked.

bunbun
  • 2,595
  • 3
  • 34
  • 52
Muhammad Naufil
  • 2,420
  • 2
  • 17
  • 48
  • 2
    The `` with the id `cwos` was not found and the result of the `find()` became `None`. You should handle this case. – Klaus D. Dec 30 '18 at 18:10
  • I can't see that id by entering your link in browser. What is the expected text to be returned? – QHarr Dec 30 '18 at 18:45

3 Answers3

5

A big gotcha when parsing websites is that the source code can look very different in your browser when compared to what requests sees. The difference is javascript, which can hugely modify the DOM in a javascript capable browser.

I'd suggest 3 options:

  1. use requests to get the page, and then examine it closely - does that tag exist when the page is retrieved by a non-js enabled agent?
  2. use https://www.seleniumhq.org/ as your agent - it's essentially a fully featured browser that you can control programatically, inc w/ python.
  3. use google's search API instead of trying to scrape the html
Danielle M.
  • 3,607
  • 1
  • 14
  • 31
  • 3
    `Use Google's search API`: What API are you referring to? AFAIK, Google hasn’t had a public search API for [well over 8 years](https://stackoverflow.com/questions/4082966/what-are-the-alternatives-now-that-the-google-web-search-api-has-been-deprecated). – grooveplex Dec 30 '18 at 18:20
  • 1
    Boy do I feel old, I had no idea o.O – Danielle M. Dec 30 '18 at 18:21
3

Next time use the query string exactly as it is.

import requests
from bs4 import BeautifulSoup
search="2%2B2"
link="https://www.google.com/search?q="+search
headers={'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'}
source=requests.get(link,headers=headers).text
soup=BeautifulSoup(source,"html.parser")
answer=soup.find('span',id="cwos")
print(answer.text)

Output:

 4  

Visit these urls - they do not return the same result

https://www.google.com/search?q=What+is+2+2

https://www.google.com/search?q=2%2B2

https://www.google.com/search?q=2+2

Bitto
  • 7,937
  • 1
  • 16
  • 38
0

When you run the code, you might encounter an AttributeError:

shell: AttributeError: 'NoneType' object has no attribute 'text'

If that’s the case, then take a step back and inspect your previous results. Were there any items with a value of None? You might have noticed that the structure of the page is not entirely uniform. There could be an advertisement in there that displays in a different way than the normal job postings, which may return different results

Reference:https://realpython.com/beautiful-soup-web-scraper-python/#extract-text-from-html-elements.

Dharman
  • 30,962
  • 25
  • 85
  • 135