I am writing a webscraper that goes through list of links from a CSV file and scrapes details from each of them. However I'm having trouble pointing to an element that holds the email addresses that I am trying to scrape. If you look at [https://reality.idnes.cz/rk/detail/m-m-reality-holding-a-s/5a85b582a26e3a321d4f2700/] you can see that there is a company name, address, phone number and an email. The email is the element I'm having problem with. If you look into the code of the website you will quickly notice that both phone number and the email have the same header class of "item-icon". If you look into my code you can see that I tried to refer to the actual class using the nth child which didn't work also for some reason. The result doesn't get printed and put into the CSV file hence is not found. This is my code I'm having trouble with :
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
from selenium.webdriver.chrome.options import Options
import time
import csv
with open('links.csv') as read:
reader = csv.reader(read)
link_list = list(reader)
with open('ScrapedContent.csv', 'w+', newline='') as write:
writer = csv.writer(write)
options = Options()
options.add_argument('--no-sandbox')
path = "/home/kali/Desktop/SRealityContentScraper/chromedriver"
driver = webdriver.Chrome(path)
wait = WebDriverWait(driver, 10)
for link in link_list:
driver.get(', '.join(link))
time.sleep(2)
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "h1.b-annot__title.mb-5")))
title = driver.find_element_by_css_selector("h1.b-annot__title.mb-5")
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "span.btn__text")))
offers = driver.find_element_by_css_selector("span.btn__text")
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "p.font-sm")))
addresses = driver.find_element_by_css_selector("p.font-sm")
try:
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR, "a.item-icon.measuring-data-layer")))
phone_number = driver.find_element_by_css_selector("a.item-icon.measuring-data-layer")
except Exception:
pass
try:
information_list = wait.until(ec.presence_of_element_located((By.CSS_SELECTOR,"span.items:nth-of-type(2) span.items__item a.item-icon")))
email = driver.find_element_by_css_selector("span.items:nth-of-type(2) span.items__item a.item-icon")
except Exception:
pass
try:
phone_number = phone_number.text
except Exception:
phone_number = " "
pass
try:
email = email.text
except Exception:
email = " "
pass
print(title.text, " ", offers.text, " ", addresses.text, " ", phone_number, " ", email)
writer.writerow([title.text, offers.text, addresses.text, phone_number, email])
driver.quit()
The reason why there are try loops in the code is that sometimes one of the pages in the link list is missing either email or phone number. And so I made it so that if that happens the place of the information is filled with " " empty string. However the information doesn't get printed even if it's present on the page which leads me to believe that the element wasn't found properly. I removed the loops to test the output and indeed Selenium confirms that the element could not be found. Without the nth child the scraper instead scrapes 2 phone numbers instead of phone number and an email. This to my understanding is caused by Selenium always looking for the first element of that CSS selector on the page which will always be the phone number.
My question is how can I properly point to the element so that the emails get scraped properly? Thanks for any help with this! I am starting to feel hopeless...