1

A block of code on a website I'd like to use Selenium ( with Python ) on ( for web scraping ) looks like the following -

<div class="exp_date">
  <span class="uppr_sec">
    <i class="exp_clndr"></i>
    <label> 04 Jan 2021 09:30 AM - 04 Jan 2021 10:30 AM </label>
  </span>
  
  <br>
  
  <div class="clear"></div>
  
  <span class="lwr_sec">
    <i class></i>
    <label>Hosted By Some Random Person</label>
  </span>

</div>

I'd like to print the text enclosed in the <label> tags in both the spans i.e. "04 Jan 2021 09:30 AM - 04 Jan 2021 10:30 AM" and "Hosted By Some Random Person" in the Python console, using Selenium. However, I am not sure about the steps to do so, because the labels are nested in their respective spans, which are nested in a div.

Can someone please help me out with the code needed to do so ? (in Python)

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Pranav N
  • 49
  • 5

1 Answers1

1

To extract and print the texts e.g. 04 Jan 2021 09:30 AM - 04 Jan 2021 10:30 AM using Selenium and you can use either of the following Locator Strategies:

  • Using css_selector and get_attribute("innerHTML"):

    print([my_elem.get_attribute("innerHTML") for my_elem in driver.find_elements_by_css_selector("div.exp_date > span.uppr_sec label")])
    
  • Using xpath and text attribute:

    print([my_elem.text for my_elem in driver.find_elements_by_xpath("//div[@class='exp_date']/span[@class='uppr_sec']//label")])
    

Ideally you need to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR and get_attribute("innerHTML"):

    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.exp_date > span.uppr_sec label")))])
    
  • Using XPATH and text attribute:

    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='exp_date']/span[@class='uppr_sec']//label")))])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Outro

Link to useful documentation:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • This is just everything I needed, thanks a ton! I've got another query though - I also have to identify a link starting with "abc.com" from the source HTML code of a given webpage, for which I'm planning to use `wd.page_source` (wd is the webdriver object ) to obtain the HTML source code of the page, and then, use Python RegEx to search for a string starting with "abc.com". Is there a Selenium - specific workaround to this, without using RegEx ( something like a search mechanism ) ? Thanks in advance! – Pranav N Jan 04 '21 at 10:12
  • @PranavN Sounds like a completely different issue all together. Can you raise a new question as per your new requirement? Stackoverflow contributors will be happy to help you out. – undetected Selenium Jan 04 '21 at 10:14