1

I am new to web crawling, thanks for helping out. The task I need to perform is to obtain the full returned HTTP response from google search. When searching on Google with a search keyword in browser, in the returned page, there is a section:

Searches related to XXXX (where XXXX is the searched words)

I need to extract this section of the web page. From my research, most of the current package on google crawling are not able to extract this section of information. I tried to use urllib2, with the following code:

import urllib2
url = "https://www.google.com.sg/search? q=test&ie=&oe=#q=international+business+machine&spf=187"
req = urllib2.Request(url, headers={'User-Agent' : 'Mozilla/5.0'})
con = urllib2.urlopen( req )
strs = con.read()
print strs

I am getting a large chunk of text which looks like legit HTTP response, but within the text, there isn't any content related to my searched key "international business machine". I know Google probably detect this is not request from an actual browser hence hide this info. May I know if there is any way to bypass this and obtained the "related search" section of google result? Thanks.

user1750197
  • 39
  • 1
  • 4

1 Answers1

0

as pointed out by @anonyXmous. the useful post to refer to is here:

Google Search Web Scraping with Python

with

from requests import get
keyword = "internation business machine"
url = "https://google.com/search?q="+keyword
raw = get(url).text
print raw

I am able to get the needed text in "raw"

Community
  • 1
  • 1
user1750197
  • 39
  • 1
  • 4
  • The weired thing is that I don't get the content of the real page (the one I get when I paste "internation business machine" in the google search and hit enter)... – the_economist Apr 14 '21 at 13:50