0

Scraping this page here. I am trying to get the mail icon in the names. I have tried many things but cannot seem to click/find it. Some help please?

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Chrome(executable_path='C:/chromedriver.exe')
search_term = input("Enter your search term :")
url = f'https://www.sciencedirect.com/search?qs={search_term}&show=100'
driver.get(url)
driver.maximize_window()

WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,'/html/body/div[3]/div/div/div/button/span'))).click()
divs = driver.find_elements_by_class_name('result-item-content')
links = []
for div in divs:
    link = div.find_element_by_tag_name('a')
    links.append(link)
links[0].click()
div = driver.find_element_by_id('author-group')
print(div.text[0:])
name_links = div.find_elements_by_tag_name('a')
spans =[]
for name in name_links:
    span = name.find_element_by_tag_name('span')
    spans.append(span)

for span in spans:
    mail = span.find_element_by_class_name('icon icon-envelope')
    mail.click()
    break
Abhishek Rai
  • 2,159
  • 3
  • 18
  • 38

1 Answers1

1

It seems that not every author has that icon, but, even taking that into account, you have a couple of mistakes in the current approach:

  • you are looking inside each span element of the author group - you don't have to do that
  • find_element_by_class_name would work with a single class value, not multiple (class is a multi-valued attribute with space being a delimiter between values)

Here is how would I go about this:

from selenium.common.exceptions import NoSuchElementException


author_group = driver.find_element_by_id('author-group')

for author in author_group.find_elements_by_css_selector("a.author"):
    try:
        given_name = author.find_element_by_css_selector(".given-name").text
        surname = author.find_element_by_css_selector(".surname").text
    except NoSuchElementException:
        print("Could not extract first or last name")
        continue

    try:
        mail_icon = author.find_element_by_css_selector(".icon-envelope")
        mail_icon_present = True
    except NoSuchElementException:
        mail_icon_present = False

    print(f"Author {given_name} {surname}. Mail icon present: {mail_icon_present}")

Notes:

  • note how we iterate over authors, container by container, and then looking for specific properties inside each one
  • note how we are checking for the presence of the mail icon in a forgiving EAFP manner
  • the . in before a class value in a CSS selector is a special syntax to match an element by a single class value
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • gives this error `selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"css selector","selector":".given-name"} (Session info: chrome=87.0.4280.88) ` – Abhishek Rai Dec 18 '20 at 17:20
  • @AbhishekRai okay, probably not all authors have given name or surnames there, added a try/except there. General ideas in the answer still apply. Also, which URL are you testing it against? – alecxe Dec 18 '20 at 17:23
  • oh. I'm so sorry, I added the years ..`https://www.sciencedirect.com/search?qs={search_term}&years=2021%2C2020%2C2019&lastSelectedFacet=years` – Abhishek Rai Dec 18 '20 at 17:24