-1

I tried to get these dates from the Upcoming Events on the page https://www.python.org/

I did this with my code

from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager

chrome_driver_path = "C:\develoment\chromedriver.exe"
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
driver.get("https://www.python.org/")

silent = driver.find_elements(By.TAG_NAME, 'time')
silent2 = [x.text for x in silent]
print(silent2)

But when i tried to printed I got this.

['2023-03-08', '2023-02-15', '2023-02-10', '2023-02-08', '2023-02-08', '2023-03-13', '2023-03-15', '2023-03-22', '2023-04-01', '2023-04-07']

What i´m doing wrong?

HTML snapshot:

enter image description here

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • 2
    Please provide us with an example of what you're expected output should be. The image doesn't really help. Check out [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) – Ian Thompson Mar 08 '23 at 20:53
  • 3
    The date you have in your screenshot is present in your list. The last 5 date are from that section. What is the problem? – JNevill Mar 08 '23 at 20:54
  • in this page https://www.python.org/ on the section of upcoming events you´ll see the dates and the title , the title is a – Joaquín Ruiz Mar 08 '23 at 21:48

2 Answers2

0

The locator you have used:

driver.find_elements(By.TAG_NAME, 'time')

Identifies the elements from the Latest News section as well from the Upcoming Events section. Hence you find additional entries.


Solution

To extract the values only from the Upcoming Events section you can use either of the following locator strategies:

  • Using css_selector and get_attribute("textContent"):

    driver.get("https://www.python.org/")
    print([my_elem.get_attribute("textContent") for my_elem in driver.find_elements(By.CSS_SELECTOR, "div.medium-widget.event-widget.last time")])
    
  • Using xpath and text attribute:

    driver.get("https://www.python.org/")
    print([my_elem.text for my_elem in driver.find_elements(By.XPATH, "//div[@class='medium-widget event-widget last']//time")])
    

To extract the texts ideally you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:

  • Using CSS_SELECTOR and text attribute:

    driver.get('https://www.python.org/')
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.medium-widget.event-widget.last time")))])
    
  • Using XPATH and get_attribute("textContent"):

    driver.get('https://www.python.org/')
    print([my_elem.get_attribute("textContent") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='medium-widget event-widget last']//time")))])
    
  • Console Output:

    ['2023-03-13', '2023-03-15', '2023-03-22', '2023-04-01', '2023-04-07']
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

To print date and event name from the web page you mentioned, use below code:

# Find all the list items  having time and event name
events = driver.find_elements_by_css_selector('li')

# Loop through event and extract the date and event name
for event in events:
    date = event.find_element_by_css_selector('time').get_attribute('datetime')[:10]
    name = event.find_element_by_css_selector('a').text
    print(date, name)
Harish
  • 306
  • 1
  • 4
  • 14