Use Python to Scrape for Data in Family Search Records

Question

I am trying to scrape the following record table in familysearch.org. I am using the Chrome webdriver with Python, using BeautifulSoup and Selenium.

Upon inspecting the page I am interested in, I wanted to scrape from the following bit in HTML. Note this is only one element part of a familysearch.org table that has 100 names.

<span role="cell" class="td " name="name" aria-label="Name"> <dom-if style="display: none;"><template is="dom-if"></template></dom-if> <dom-if style="display: none;"><template is="dom-if"></template></dom-if> <span><sr-cell-name name="Jame Junior " url="ZS" relationship="Principal" collection-name="Index"></sr-cell-name></span> <dom-if style="display: none;"><template is="dom-if"></template></dom-if> </span>

Alternatively, the name also shows in this bit of HTML

<a class="name" href="/ark:ZS">Jame Junior </a>

From all of this, I only want to get the name "Jame Junior", I have tried using driver.find.elements_by_class_name("name"), but it prints nothing.

This is the code I used

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
import pandas as pd
from getpass import getpass


username = input("Enter Username: " )
password = input("Enter Password: ")
chrome_path= r"C:\Users...chromedriver_win32\chromedriver.exe"
driver= webdriver.Chrome(chrome_path)
driver.get("https://www.familysearch.org/search/record/results?q.birthLikeDate.from=1996&q.birthLikeDate.to=1996&f.collectionId=...")

usernamet = driver.find_element_by_id("userName")
usernamet.send_keys(username)
passwordt = driver.find_element_by_id("password")
passwordt.send_keys(password)
login = driver.find_element_by_id("login")
login.submit()
driver.get("https://www.familysearch.org/search/record/results?q.birthLikeDate.from=1996&q.birthLikeDate.to=1996&f.collectionId=.....")
WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CLASS_NAME, "name")))
#for tag in driver.find_elements_by_class_name("name"):
 #   print(tag.get_attribute('innerHTML'))

for tag in soup.find_all("sr-cell-name"):
    print(tag["name"])

Do you want all the names that follow this format? or specifically only the name "Jame Junior" out of the entire page? — MendelG, Jul 15 '21 at 20:58
My answer includes a solution for _all_ the names, did it work? — MendelG, Jul 15 '21 at 21:15
It did not work unfortunately, for your first suggestion using Selenium, it only says process finished with exit code 0. For the second solution it tells me that soup is undefined. I tried defining soup=(), but that did not work either. — rlearner, Jul 15 '21 at 21:18

MendelG · Answer 1 · 2021-07-15T21:20:53.360

1

Try to access the sr-cell-name tag.

Selenium:

for tag in driver.find_elements_by_tag_name("sr-cell-name"):
    print(tag.get_attribute("name"))

BeautifulSoup:

for tag in soup.find_all("sr-cell-name"):
    print(tag["name"])

EDIT: You might need to wait for the element to fully appear on the page before parsing it. You can do this using the presence_of_element_located method:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Chrome()
driver.get("...")

WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CLASS_NAME, "name")))

for tag in driver.find_elements_by_class_name("name"):
    print(tag.get_attribute('innerHTML'))

edited Jul 15 '21 at 21:20

answered Jul 15 '21 at 20:48

MendelG

14,885
4
25
52

It did ran, but it gave my the following message : Traceback (most recent call last): "File "C:\Users....py", line 22, in WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CLASS_NAME, "name"))) File "C:\Users...", line 80, in until raise TimeoutException(message, screen, stacktrace) selenium.common.exceptions.TimeoutException: Message:" I still was not able to print anything – rlearner Jul 15 '21 at 21:30
@rlearner Please [edit] your question to show us the full code you have tried – MendelG Jul 15 '21 at 21:38
Just updated the question to include the full code @MendelG – rlearner Jul 17 '21 at 03:45

DigitalGangsta · Answer 2 · 2021-07-23T02:41:42.060

I was looking to do something very similar and have semi-decent python/selenium scraping experience. Long story short, FamilySearch (and many other sites, I'm sure) use some kind of technology (I'm not a JS or web guy) that involves shadow host. The tags are essentially invisible to BS or Selenium.

Solution: pyshadow https://github.com/sukgu/pyshadow

You may also find this link helpful: How to handle elements inside Shadow DOM from Selenium

I have now been able to successfully find elements I couldn't before, but am still not all the way where I'm trying to get. Good luck!

Use Python to Scrape for Data in Family Search Records

2 Answers2