-3

Lets say I have a list consisting of cities. For example,

zip = ['newyork','delhi']

how to search new york pincode and delhi pincode on google and extract the data.

This search gives the result and there are multiple pincodes present. I only need to capute the first one.

The output I need :

{Newyork: 10001, Delhi: 110001}

I tried this

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import re
import pandas as pd
import os
import html5lib
import json
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
url = "https://www.google.com/"
chromedriver = r"C:\Users\me\chromedriver"
driver = webdriver.Chrome(chromedriver)
driver.implicitly_wait(30)
driver.get(url)
search = driver.find_element_by_name('q')
pincodencodee=['newyork','delhi']
for i in pincodencodee:    
    search.send_keys(i)
search.send_keys(Keys.RETURN) 
time.sleep(5) 
driver.quit()

3 Answers3

2

The first zip code has attribute data-idx which is running index from 0, so the first zip code will have data-idx="0". You also need to relocate the search bar each time to prevent StaleElementReferenceException

driver.maximize_window()
driver.get(url)

pincodencodee = {'new york': -1, 'delhi': -1}
for key in pincodencodee.keys():
    search = driver.find_element_by_name('q')
    search.clear()
    search.send_keys(key + ' pincode')
    search.send_keys(Keys.RETURN)

    code = driver.find_element_by_css_selector('.rl_item[data-idx="0"] .title')
    pincodencodee[key] = code.text

driver.quit()

print(pincodencodee) # {'new york': '10001', 'delhi': '110001'}
Guy
  • 46,488
  • 10
  • 44
  • 88
0

As your desired output is:

{Newyork: 10001, Delhi: 110001}

Presumably it is a Python Dictionary which you need to construct with the Keys from the supplied list and Values from the Google Search result. To achieve that you can use the following Locator Strategies:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.common.exceptions import TimeoutException
    from selenium.webdriver.common.keys import Keys
    
    cities = ['newyork','delhi']
    search_texts = [city + ' pincode' for city in cities]
    print(search_texts)
    pincode = []
    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument("start-maximized")
    chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
    chrome_options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=chrome_options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get("https://www.google.com/")
    for my_text in search_texts:
        try:
            search = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q")))
            search.clear()
            search.send_keys(my_text)
            search.send_keys(Keys.RETURN)
            element_text = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[@class='rl_item rl_item_base']//div[@class='title']"))).text
            pincode.append(element_text)
        except TimeoutException as e:
            print(e)
    Dict = dict(zip(cities, pincode))
    print(Dict)
    driver.quit()
    
  • Console Output:

    ['newyork pincode', 'delhi pincode']
    {'newyork': '10001', 'delhi': '110001'}
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
-1

Okay so this is a problem that either requires constant dom structure or heavy regex.

Im not gonna waste time on regex, however i can help you with extracting the code itself.

So how i would approach this is try to execute javascript on the browser so that would be:

webdriver.execute_script('script')

That by itself is no good, so lets take it a step further. Lets initialize a variable and call it area_code so then we can push it to our list, and then pass it into the execute_script() function as an argument. Lets also grab the code using javascript.

area_code = "0"
webdriver.execute_script("arguments[0] = document.getElementsByClassName('title')[0].innerText", area_code )

Here in the second line arguments[0] is indeed the area_code variable which we are setting.

Then you can do either key value or 2D array to store the data.