2

I am trying to extract people's href from the URL https://www.dx3canada.com/agenda/speakers.

I tried:

elems = driver.find_elements_by_css_selector('.display-flex card vancouver')
href_output = []
for ele in elems:
    href_output.append(ele.get_attribute("href"))
print(href_output)

But the output list returns nothing...

The expected href shown as the image below and I hope the outputs as a list of hrefs: enter image description here

I really appreciate the help!

Community
  • 1
  • 1
Bangbangbang
  • 560
  • 2
  • 12

3 Answers3

5

To extract the people's href attribute from the URL https://www.dx3canada.com/agenda/speakers as the the desired elements are within an <iframe> so you have to:

  • Induce WebDriverWait for the desired frame to be available and switch to it.
  • Induce WebDriverWait for the visibility of all elements located.
  • You can use the following Locator Strategies:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get('https://www.dx3canada.com/agenda/speakers')
    WebDriverWait(driver, 30).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#whovaIframeSpeaker")))
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "a.display-flex.card.vancouver")))])
    
  • Console Output:

    ['https://whova.com/embedded/speaker_detail/dcrma_202003/9942778/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907682/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907688/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907676/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907696/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907690/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907670/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907693/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9942779/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9908087/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907671/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907681/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907673/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907678/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907689/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907674/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907684/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907685/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907686/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9942780/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907695/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907687/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907683/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907692/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907672/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907697/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907680/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907679/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907675/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907677/', 'https://whova.com/embedded/speaker_detail/dcrma_202003/9907694/']
    

Here you can find a relevant discussion on Ways to deal with #document under iframe

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • 1
    Great answer -- Could you explain the use of `add_experimental_option` for `enable-automation` and `useAutomationExtension`? I have never seen these options before and I am curious to know what they are used for! – CEH Dec 02 '19 at 20:53
  • 1
    @Christine In short, the two `experimental_option()` **enable-automation** and **useAutomationExtension** I'm using for a few different things, two examples are, to get rid of the inforbars, not to get detected as a bot and so on. – undetected Selenium Dec 02 '19 at 20:58
3

Your images are in an iframe, so you will need to switch to this before you can scrape the href attributes using frame_to_be_available_and_switch_to_it.

Then, to get the list of all href attributes, you may need to run some Javascript to scroll the image into view, and handle the case where the images may be lazy loading the href:

# first, switch to iframe
WebDriverWait(driver, 30).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@id='whovaIframeSpeaker']")))

elements_list = driver.find_elements_by_xpath("//div[contains(@class, 'template-section-body')]/a[contains(@class, 'display-flex card vancouver')]")

for element in elements_list:
    driver.execute_script("arguments[0].scrollIntoView(true);", element)
    print(element.get_attribute("href"))

The results of this code:

enter image description here

CEH
  • 5,701
  • 2
  • 16
  • 40
  • Thank you, Christine. I tried the code above but doesn't work. https://www.dx3canada.com/agenda/speakers is the link I am scraping – Bangbangbang Dec 02 '19 at 20:18
  • @Bangbangbang After looking at your web page, I see the problem is an `iframe` now. I have updated my answer and tested it. The `href` are all successfully printing now. – CEH Dec 02 '19 at 20:42
0

For your css selector use .display-flex.card.vancouver instead.

elems = driver.find_elements_by_css_selector('.display-flex.card.vancouver')

Each word is a class, so you need to place a dot in the front of each one.

RKelley
  • 1,099
  • 8
  • 14