I am trying to learn how to webscrape news headlines using Python by following along with this post I found: https://medium.com/analytics-vidhya/how-to-scrape-news-headlines-from-reuters-27c0274dc13c
It worked perfectly, however when I tried to emulate it with other newspages, I continue to get the no such element error. I realize that it is because I am choosing the wrong class element within the html, however I don't understand what other class I should be choosing.
The above script was used on this news page:https://www.reuters.com/news/archive/technologynews?view=page&page=6&pageSize=10
I attempted to use it on the following pages, specifically looking into a local state agency:
https://www.startribune.com/search/?page=1&q=%22Department%20of%20Human%20Services%22&refresh=true
https://www.twincities.com/?s=%22Department+of+Human+Services%22&orderby=date&order=desc
Here is the code, the only changes of which are replacing the reuters webpage with the 1st of the ones I am looking into and replacing the class element for the button selection:
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import dateutil.parser
import time
import csv
from datetime import datetime
import io
driver = webdriver.Chrome('chromedriver.exe')
driver.get('https://www.startribune.com/search/?page=1&q=%22Department%20of%20Human%20Services%22&refresh=true')
count = 0
headlines =[]
dates = []
for x in range(500):
try:
# loadMoreButton.click()
# time.sleep(3)
loadMoreButton = driver.find_element_by_class_name("pagination-shortcut-link")
# driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")
time.sleep(3)
loadMoreButton.click()
time.sleep(2)
news_headlines = driver.find_elements_by_class_name("story-title")
news_dates = driver.find_elements_by_class_name("timestamp")
for headline in news_headlines:
headlines.append(headline.text)
print(headline.text)
for date in news_dates:
dates.append(date.text)
print(date.text)
count=count+1
print("CLICKED!!:")
except Exception as e:
print(e)
break
To get the class name I right clicked on the next button to and selected inspect element and copied what I saw. However I continue to get the error. I am not really sure what other class element I am meant to be using.