Unable to get the text data, where dynamic number of span tag are inside p tag- Selenium Python

Question

I am trying to get the text data from the below website:-

https://www.lemoyne.edu/Give/Information-for-Donors/Honor-Roll/1954

Any suggestion/help would be appreciated. Thanks in advance!!


driver = webdriver.Chrome("chromedriver.exe")
driver.maximize_window()
driver.get("https://www.lemoyne.edu/Give/Information-for-Donors/Honor-Roll/1954")
time.sleep(10)

donors= driver.find_elements("xpath",'//div[@class = "container"]/div[@class="donorcolumn"]/p')
donors

##Result:- Empty List []

for donor in donors:
   print(donor.get_attribute("innerHTML"))

##Result:- Empty List []

for donor in donors:
   print(donor.text)

## Result:- Empty List []

Expectation:-

The Hon. Salvatore J. Arrigo Jr. '54 and Mrs. Elizabeth J. Arrigo (35) President's Club Annual Fund Previous President's Club Member
Margaret A. Dwyer '54, L.C.H.D. '94 (35) President's Club Previous President's Club Member
Frances Morrison Scott Estate (1) Previous President's Club Member
Rosemary T. Fatcheric '54 (12)
Jo-An Feyerabend '54 (35)
James H. Greiner '54 (11) Annual Fund
Charles R. Nojaim '54 and Patricia Nojaim (22)
Marie Dinehart Rathbun '54 (22)
Audrey Zillioux Rich '54 (30) Annual Fund
David G. Schoeneck '54 and Therese Sharpe Schoeneck '54 (25) Annual Fund
John H. Senecal '54 (14)
John B. Vita '54 and Mary M. Vita (1)
Eugene P. Vukelic '54 (8) President's Club Annual Fund Previous President's Club Member

score 1 · Answer 1 · answered Jun 06 '23 at 13:22

That data is in an iframe. If you are keen on using Selenium, you first need to switch to that iframe, and then get the data from it. Here is an alternative (lighter and simpler) way to get that data, by scraping the iframe source directly:

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}
url = 'https://s3.amazonaws.com/lemoynehonorroll/1954.html'

r = requests.get(url, headers=headers)

soup = bs(r.text, 'html.parser')
donors_list = [x.get_text(strip=True, separator=' ') for x in soup.select('div[class="donorcolumn"] p')]
print(donors_list)

Result in terminal:

["The Hon. Salvatore J. Arrigo Jr. '54 and Mrs. Elizabeth J. Arrigo (35)",
 "Margaret A. Dwyer '54, L.C.H.D. '94 (35)",
 'Frances Morrison Scott Estate (1)',
 "Rosemary T. Fatcheric '54 (12)",
 "Jo-An Feyerabend '54 (35)",
 "James H. Greiner '54 (11)",
 "Charles R. Nojaim '54 and Patricia Nojaim (22)",
 "Marie Dinehart Rathbun '54 (22)",
 "Audrey Zillioux Rich '54 (30)",
 "David G. Schoeneck '54 and Therese Sharpe Schoeneck '54 (25)",
 "John H. Senecal '54 (14)",
 "John B. Vita '54 and Mary M. Vita (1)",
 "Eugene P. Vukelic '54 (8)"]

score 1 · Accepted Answer · answered Jun 06 '23 at 13:23

If you notice the HTML, desired elements are wrapped withing an iframe, you need to switch into the frame and then perform other actions, use below code to switch to iframe:

wait = WebDriverWait(driver, 10)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID, "dnn_ctr11646_IFrame_htmIFrame")))

Full code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://www.lemoyne.edu/Give/Information-for-Donors/Honor-Roll/1954")
wait = WebDriverWait(driver, 10)
wait.until(EC.frame_to_be_available_and_switch_to_it((By.ID, "dnn_ctr11646_IFrame_htmIFrame")))
donors = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class = 'container']/div[@class='donorcolumn']/p")))

for donor in donors:
    print(donor.get_attribute("innerHTML"))

for donor in donors:
    print(donor.text)

score 1 · Answer 3 · answered Jun 06 '23 at 23:30

The desired elements are within an <iframe> so you have to:

Induce WebDriverWait for the desired frame to be available and switch to it.

To extract the texts you can use list comprehension and you can use either of the following locator strategies:

Using CSS_SELECTOR:

driver.get("https://www.lemoyne.edu/Give/Information-for-Donors/Honor-Roll/1954")
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe[id$='IFrame_htmIFrame']")))
print([my_elem.text for my_elem in driver.find_elements(By.CSS_SELECTOR, "div.donorcolumn p")])

Using XPATH:

driver.get("https://www.lemoyne.edu/Give/Information-for-Donors/Honor-Roll/1954")
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe[id$='IFrame_htmIFrame']")))
print([my_elem.text for my_elem in driver.find_elements(By.XPATH, "//div[@class='donorcolumn']//p")])

Note : You have to add the following imports :

 from selenium.webdriver.support.ui import WebDriverWait
 from selenium.webdriver.common.by import By
 from selenium.webdriver.support import expected_conditions as EC

Console Output:

["The Hon. Salvatore J. Arrigo Jr. '54 and Mrs. Elizabeth J. Arrigo (35)", "Margaret A. Dwyer '54, L.C.H.D. '94 (35)", 'Frances Morrison Scott Estate (1)', "Rosemary T. Fatcheric '54 (12)", "Jo-An Feyerabend '54 (35)", "James H. Greiner '54 (11)", "Charles R. Nojaim '54 and Patricia Nojaim (22)", "Marie Dinehart Rathbun '54 (22)", "Audrey Zillioux Rich '54 (30)", "David G. Schoeneck '54 and Therese Sharpe Schoeneck '54 (25)", "John H. Senecal '54 (14)", "John B. Vita '54 and Mary M. Vita (1)", "Eugene P. Vukelic '54 (8)"]

Reference

You can find a couple of relevant discussions in:

Thank You both Shawn and Undetected Selenium Its really helpful & I appreciate your efforts :) :) — Umesh Kumar, Jun 07 '23 at 12:41

Unable to get the text data, where dynamic number of span tag are inside p tag- Selenium Python

3 Answers3

Reference