Get list of titles from h3 class name with Selenium

Question

I am trying to get a list of titles after activating a search function but I keep getting an empty list even if the path is correct in finding the various iterations of h3 class titles. See below an example of where one title I am trying to copy is located in HTML. The class type changes every time but the position is always within h3.

So I tried with the code below to extract the list of titles:

import pandas as pd
from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager
from selenium.webdriver.firefox.options import Options

options = Options()
options.set_preference("dom.push.enabled", False)
browser = webdriver.Firefox(options=options)

browser.get("https://medium.com/search")
browser.find_element_by_xpath("//input[@type='search']").send_keys("Flying elephant",Keys.ENTER)
titles = browser.find_elements_by_xpath("//h3[contains(@class,'graf')]")

lista = []
for names in titles:
    print(names.text)
    lista.append(names.text)     

browser.quit()

The code runs but the list I get back does not have any element. Thank you for any tips to help me with this

maybe try to get `"//h3"` and check if it has `class` and what value it has in `class`. Some servers may use random values in classes and create different values for different users. — furas, Apr 01 '21 at 12:36
Yes, I did locate it by inspecting the source, and there is one constant class value. The solution below addresses my issue. — Nicola, Apr 01 '21 at 13:02
I was asking to check it with selenium - not by inspecting manually the source. You could also check `browser.page_source` to see what you really get in selenium. But I see you already get answer. Waiting for data usually is the second step in checking problem (because JavaScript needs some time to add them HTML) — furas, Apr 01 '21 at 13:22
This is a good tip, the issue with page_source is that I get a dump of text instead of a nicely html structure, do you know if there is a command I can run to view it also as it is structured in a hierarchical sense? — Nicola, Apr 01 '21 at 13:26
you can use `page_source` with `BeautifulSoup` or `lxml` and probably some of them may have function to reformat it. But HTML may have new lines `\n` in some places and these functions may keep them even after reformating and it will not look like you expect. — furas, Apr 01 '21 at 13:36
[How to Pretty Print HTML to a file, with indentation](https://stackoverflow.com/questions/6150108/how-to-pretty-print-html-to-a-file-with-indentation) — furas, Apr 01 '21 at 13:38

KunduK · Accepted Answer · 2021-04-01T13:03:13.690

You need to wait for element to be visible after your search. Use WebDriverWait() and wait for visibility_of_all_elements_located()

browser.get("https://medium.com/search")
browser.find_element_by_xpath("//input[@type='search']").send_keys("Flying elephant",Keys.ENTER)
titles =WebDriverWait(browser,20).until(EC.visibility_of_all_elements_located((By.XPATH,"//h3[contains(@class,'graf')]")))

lista = []
for names in titles:
    print(names.text)
    lista.append(names.text) 
print(lista)

You need to import below libraries.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

Output:

['A Flying Elephant, a Teacher’s Hugs: 12 Tales of Pandemic Resilience', 'Who Knew Disney Could Do Trippy Even Better Than Pink Floyd?', '#FlightFree2020: Travel Blogging And The Multiplier Effect', 'Bluesky and Dumbo ‘The Flying Elephant’', 'The Flying Elephant. A Tank So Heavy The British Decided Not To Build It.', 'The Flying Elephant. A Tank So Heavy The British Decided Not To Build It.']

Thanks, why do we need a double equations: titles =allelements=...? This is running also by simply using one equivalence titles=... — Nicola, Apr 01 '21 at 13:01

Get list of titles from h3 class name with Selenium

1 Answers1