how to obtain full HTML content from a google search result page

Question

I am new to web crawling, thanks for helping out. The task I need to perform is to obtain the full returned HTTP response from google search. When searching on Google with a search keyword in browser, in the returned page, there is a section:

Searches related to XXXX (where XXXX is the searched words)

I need to extract this section of the web page. From my research, most of the current package on google crawling are not able to extract this section of information. I tried to use urllib2, with the following code:

import urllib2
url = "https://www.google.com.sg/search? q=test&ie=&oe=#q=international+business+machine&spf=187"
req = urllib2.Request(url, headers={'User-Agent' : 'Mozilla/5.0'})
con = urllib2.urlopen( req )
strs = con.read()
print strs

I am getting a large chunk of text which looks like legit HTTP response, but within the text, there isn't any content related to my searched key "international business machine". I know Google probably detect this is not request from an actual browser hence hide this info. May I know if there is any way to bypass this and obtained the "related search" section of google result? Thanks.

http://stackoverflow.com/questions/38619478/google-search-web-scraping-with-python; try this solution — jose_bacoy, Apr 18 '17 at 03:52
@anonyXmous. Thanks a lot. simple and working like a charm. the trick is to use： from requests import get — user1750197, Apr 18 '17 at 06:34

score 0 · Answer 1 · edited May 23 '17 at 12:02

0

as pointed out by @anonyXmous. the useful post to refer to is here:

Google Search Web Scraping with Python

with

from requests import get
keyword = "internation business machine"
url = "https://google.com/search?q="+keyword
raw = get(url).text
print raw

I am able to get the needed text in "raw"

edited May 23 '17 at 12:02

Community

1
1

answered Apr 18 '17 at 06:38

user1750197

39
1
4

The weired thing is that I don't get the content of the real page (the one I get when I paste "internation business machine" in the google search and hit enter)... – the_economist Apr 14 '21 at 13:50

how to obtain full HTML content from a google search result page

1 Answers1