How to _scrape_ all the data from this website link using selenium and save the extracted city, location and contact number as csv dataframe object?

Question

The website url to scrape data http://jawedhabib.co.in/hairandbeautysalons-sl/

Code:

lst = driver.find_element_by_css_selector(".post-17954.page.type-page.status-publish.hentry").text 
for i in lst: 
    driver.implicitly_wait(2) 
    city = driver.find_element_by_css_selector("tr").text     
    salon_address = driver.find_element_by_css_selector("tr").text 
    Contact_number = driver.find_element_by_css_selector("tr").text    
print(lst)

lst = driver.find_element_by_css_selector(".post-17954.page.type-page.status-publish.hentry").text for i in lst: driver.implicitly_wait(2) city = driver.find_element_by_css_selector("tr").text salon_address = driver.find_element_by_css_selector("tr").text Contact_number = driver.find_element_by_css_selector("tr").text print(lst) — Nilay EM Sinha, Feb 23 '21 at 15:25
Your `lst` is a *string*! You simply iterates through characters in string. What is the point? Did you mean `lst = driver.find_elements_by_css_selector(".post-17954.page.type-page.status-publish.hentry")`? — JaSON, Feb 23 '21 at 15:29
I can't figure out the elements to which i'll iterate and extract all the data mentioned under tags. — Nilay EM Sinha, Feb 23 '21 at 15:48
It is returning some long code snippet after executing lst = driver.find_elements_by_css_selector("div.wpb_wrapper") for i in lst: driver.implicitly_wait(2) city = i.find_element_by_css_selector("tr").text salon_address = i.find_element_by_css_selector("tr").text contact_number = i.find_element_by_css_selector("tr").text print(lst) — Nilay EM Sinha, Feb 23 '21 at 16:14
The bot has go to each of the pages with different categories such as hair expresso, hair yoga and should select the text data using selectors and save it in the same format with respective column names in a pandas DataFrame.The output will be a single dataframe with data of all the pages and you will convert the data to a csv file using the df.to_csv function. — Nilay EM Sinha, Feb 23 '21 at 16:17
Instead of `print(lst)`. Put `print(city, salon_address, contact_number)` into `for` loop — JaSON, Feb 23 '21 at 16:21
lst = driver.find_elements_by_css_selector("div.post-17954.page.type-page.status-publish.hentry") for i in lst: driver.implicitly_wait(2) table_row = i.find_element_by_css_selector(".vc_row.wpb_row.vc_row-fluid").text for j in table_row: city = j.find_element_by_css_selector("tr").text salon_address = j.find_element_by_css_selector("tr").text contact_number = j.find_element_by_css_selector("tr").text print(city, salon_address, contact_number) — Nilay EM Sinha, Feb 24 '21 at 01:39
AttributeError Traceback (most recent call last) in 4 table_row = i.find_element_by_css_selector(".vc_row.wpb_row.vc_row-fluid").text 5 for j in table_row: ----> 6 city = j.find_element_by_css_selector("tr").text 7 salon_address = j.find_element_by_css_selector("tr").text 8 contact_number = j.find_element_by_css_selector("tr").text AttributeError: 'str' object has no attribute 'find_element_by_css_selector' — Nilay EM Sinha, Feb 24 '21 at 01:40

Arundeep Chohan · Accepted Answer · 2021-02-24T05:11:34.653

0

Here's the first part of your problem. Starting from the start you need to wait for all elements to load onto the screen. Grab all tables trs that are beyond the first 2 trs which are reserved for the location. From the tr xpath to their child using ./ and grab the td[1-3] text using the attribute('textContent') respectfully.

wait = WebDriverWait(driver, 60)
driver.get("http://jawedhabib.co.in/hairandbeautysalons-sl/")
#driver.maximize_window()
tableValues=wait.until(EC.presence_of_all_elements_located((By.XPATH,"//tbody//tr[position()>2]")))
city=[]
address=[]
contactno=[]
for tr in tableValues:
    #print(tr.get_attribute('textContent'))
    city.append(tr.find_element_by_xpath("./td[1]").get_attribute('textContent'))
    address.append(tr.find_element_by_xpath("./td[2]").get_attribute('textContent'))
    contactno.append(tr.find_element_by_xpath("./td[3]").get_attribute('textContent'))

Import

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

edited Feb 24 '21 at 05:11

answered Feb 24 '21 at 04:55

Arundeep Chohan

9,779
5
15
32

Grab all tables trs that are beyond the first 2 trs which are reserved for the location. What does reserved table mean – Nilay EM Sinha Feb 24 '21 at 08:39
CENTRAL ZONE City Salon Address Contact No basically ignoring those rows – Arundeep Chohan Feb 24 '21 at 08:56
I want to add a user agent and I am doing it as: from selenium.webdriver.chrome.options import Options opts = Options() opts.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36") driver = webdriver.Chrome(chrome_options=opts) but there is error – Nilay EM Sinha Feb 24 '21 at 09:24
I am getting file not found error and a webdriverexception is raised Message: 'chromedriver' executable needs to be in PATH. – Nilay EM Sinha Feb 24 '21 at 09:29
https://stackoverflow.com/questions/29858752/error-message-chromedriver-executable-needs-to-be-available-in-the-path you need to set up your chromedriver. – Arundeep Chohan Feb 24 '21 at 09:41
also chrome_options is depreciated use options – Arundeep Chohan Feb 24 '21 at 09:43

How to _scrape_ all the data from this website link using selenium and save the extracted city, location and contact number as csv dataframe object?

1 Answers1