1

I am trying to get a list of titles after activating a search function but I keep getting an empty list even if the path is correct in finding the various iterations of h3 class titles. See below an example of where one title I am trying to copy is located in HTML. The class type changes every time but the position is always within h3.

enter image description here

So I tried with the code below to extract the list of titles:

import pandas as pd
from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager
from selenium.webdriver.firefox.options import Options

options = Options()
options.set_preference("dom.push.enabled", False)
browser = webdriver.Firefox(options=options)

browser.get("https://medium.com/search")
browser.find_element_by_xpath("//input[@type='search']").send_keys("Flying elephant",Keys.ENTER)
titles = browser.find_elements_by_xpath("//h3[contains(@class,'graf')]")

lista = []
for names in titles:
    print(names.text)
    lista.append(names.text)     

browser.quit()

The code runs but the list I get back does not have any element. Thank you for any tips to help me with this

KunduK
  • 32,888
  • 5
  • 17
  • 41
Nicola
  • 446
  • 7
  • 17
  • maybe try to get `"//h3"` and check if it has `class` and what value it has in `class`. Some servers may use random values in classes and create different values for different users. – furas Apr 01 '21 at 12:36
  • Yes, I did locate it by inspecting the source, and there is one constant class value. The solution below addresses my issue. – Nicola Apr 01 '21 at 13:02
  • I was asking to check it with selenium - not by inspecting manually the source. You could also check `browser.page_source` to see what you really get in selenium. But I see you already get answer. Waiting for data usually is the second step in checking problem (because JavaScript needs some time to add them HTML) – furas Apr 01 '21 at 13:22
  • This is a good tip, the issue with page_source is that I get a dump of text instead of a nicely html structure, do you know if there is a command I can run to view it also as it is structured in a hierarchical sense? – Nicola Apr 01 '21 at 13:26
  • you can use `page_source` with `BeautifulSoup` or `lxml` and probably some of them may have function to reformat it. But HTML may have new lines `\n` in some places and these functions may keep them even after reformating and it will not look like you expect. – furas Apr 01 '21 at 13:36
  • [How to Pretty Print HTML to a file, with indentation](https://stackoverflow.com/questions/6150108/how-to-pretty-print-html-to-a-file-with-indentation) – furas Apr 01 '21 at 13:38

1 Answers1

2

You need to wait for element to be visible after your search. Use WebDriverWait() and wait for visibility_of_all_elements_located()

browser.get("https://medium.com/search")
browser.find_element_by_xpath("//input[@type='search']").send_keys("Flying elephant",Keys.ENTER)
titles =WebDriverWait(browser,20).until(EC.visibility_of_all_elements_located((By.XPATH,"//h3[contains(@class,'graf')]")))

lista = []
for names in titles:
    print(names.text)
    lista.append(names.text) 
print(lista)

You need to import below libraries.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

Output:

['A Flying Elephant, a Teacher’s Hugs: 12 Tales of Pandemic Resilience', 'Who Knew Disney Could Do Trippy Even Better Than Pink Floyd?', '#FlightFree2020: Travel Blogging And The Multiplier Effect', 'Bluesky and Dumbo ‘The Flying Elephant’', 'The Flying Elephant. A Tank So Heavy The British Decided Not To Build It.', 'The Flying Elephant. A Tank So Heavy The British Decided Not To Build It.']
KunduK
  • 32,888
  • 5
  • 17
  • 41