1

I want to extract the data with google client API.

Steps I have done:

  1. Created the google custom search engine and have a search engine id.
  2. Created a search API under a project(in my google cloud).

I am using this code snippet:

my_api_key = "my_API_KEY"
my_cse_id = "my_CSE ID"

def google_search(search_term, api_key, cse_id, **kwargs):
    service = build("customsearch", "v1", developerKey=api_key, cache_discovery=False)
    res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
    return res['items']

results = google_search('yoga', my_api_key, my_cse_id, num=10)
print(results)

Also referred to this thread:

Programmatically searching google in Python using custom search

I am getting the URL requested:

2020-11-07 15:25:54,336 | INFO | discovery.py:272 | _retrieve_discovery_doc | URL being requested: GET https://www.googleapis.com/discovery/v1/apis/customsearch/v1/rest?key=

Problem

gaierror                                  Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/httplib2/__init__.py in _conn_request(self, conn, request_uri, method, body, headers)
   1500                 if conn.sock is None:
-> 1501                     conn.connect()
   1502                 conn.request(method, request_uri, body, headers)

/opt/conda/lib/python3.6/site-packages/httplib2/__init__.py in connect(self)
   1269 
-> 1270         address_info = socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM)
   1271         for family, socktype, proto, canonname, sockaddr in address_info:

/opt/conda/lib/python3.6/socket.py in getaddrinfo(host, port, family, type, proto, flags)
    744     addrlist = []
--> 745     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    746         af, socktype, proto, canonname, sa = res

gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

ServerNotFoundError                       Traceback (most recent call last)
<ipython-input-3-b59229e204c0> in <module>()
      9     return res['items']
     10 
---> 11 results = google_search('yoga', my_api_key, my_cse_id, num=10)
     12 results

<ipython-input-3-b59229e204c0> in google_search(search_term, api_key, cse_id, **kwargs)
      5 
      6 def google_search(search_term, api_key, cse_id, **kwargs):
----> 7     service = build("customsearch", "v1", developerKey=api_key, cache_discovery=False)
      8     res = service.cse().list(q=search_term, cx=cse_id, **kwargs).execute()
      9     return res['items']

/opt/conda/lib/python3.6/site-packages/googleapiclient/_helpers.py in positional_wrapper(*args, **kwargs)
    128                 elif positional_parameters_enforcement == POSITIONAL_WARNING:
    129                     logger.warning(message)
--> 130             return wrapped(*args, **kwargs)
    131         return positional_wrapper
    132 

/opt/conda/lib/python3.6/site-packages/googleapiclient/discovery.py in build(serviceName, version, http, discoveryServiceUrl, developerKey, model, requestBuilder, credentials, cache_discovery, cache)
    222     try:
    223       content = _retrieve_discovery_doc(
--> 224         requested_url, discovery_http, cache_discovery, cache, developerKey)
    225       return build_from_document(content, base=discovery_url, http=http,
    226           developerKey=developerKey, model=model, requestBuilder=requestBuilder,

/opt/conda/lib/python3.6/site-packages/googleapiclient/discovery.py in _retrieve_discovery_doc(url, http, cache_discovery, cache, developerKey)
    272   logger.info('URL being requested: GET %s', actual_url)
    273 
--> 274   resp, content = http.request(actual_url)
    275 
    276   if resp.status >= 400:

/opt/conda/lib/python3.6/site-packages/httplib2/__init__.py in request(self, uri, method, body, headers, redirections, connection_type)
   1924                         headers,
   1925                         redirections,
-> 1926                         cachekey,
   1927                     )
   1928         except Exception as e:

/opt/conda/lib/python3.6/site-packages/httplib2/__init__.py in _request(self, conn, host, absolute_uri, request_uri, method, body, headers, redirections, cachekey)
   1593 
   1594         (response, content) = self._conn_request(
-> 1595             conn, request_uri, method, body, headers
   1596         )
   1597 

/opt/conda/lib/python3.6/site-packages/httplib2/__init__.py in _conn_request(self, conn, request_uri, method, body, headers)
   1506             except socket.gaierror:
   1507                 conn.close()
-> 1508                 raise ServerNotFoundError("Unable to find the server at %s" % conn.host)
   1509             except socket.error as e:
   1510                 errno_ = (

ServerNotFoundError: Unable to find the server at www.googleapis.com
  • 1
    `gaierror: [Errno -3] Temporary failure in name resolution` Check your internet connection, DNS settings, firewall and routing settings. – Thomas Nov 07 '20 at 15:40
  • I am doing this on kaggle notebook. I don't think there would be problem DNS and firewall and internet is working fine. – shivang sharma Nov 08 '20 at 11:20

1 Answers1

0

As for the normal steps, you should grab a page first and then find the item for the page.

In your code, the steps are not clear. You set the num=10, but you did not call it and set the start page as well.

# for example, get one page
res = service.cse().list(q=keyword, cx=cse_id).execute()


# find the next page
next_res = service.cse().list(q=query_keywords, cx=cse_id, num=10, start=res['queries']['nextPage'][0]['startIndex'],).execute()

Hope it can solve our problem

Newt
  • 787
  • 8
  • 15