Scraping news articles only beyond a certain date on Reuters

Question

I want to extract news articles from Link As you keep scrolling down, older articles keep appearing. But I only want information of last 1 year. How can I set that filter?

What have you tried so far? Post the same in the question. – pmadhu Sep 19 '21 at 13:23 — pmadhu, Sep 19 '21 at 13:23
@pmadhu I cannot think of any way to approach this – huy Sep 19 '21 at 14:39 — huy, Sep 19 '21 at 14:39

score 2 · Accepted Answer · answered Sep 19 '21 at 15:11

Try like this.

The below code scrolls till it finds 18 days ago. Change the condition to a year ago and the loop will break when it finds the news which was a year ago.

from selenium import webdriver
import time

driver = webdriver.Chrome(executable_path="path to chromedriver.exe")
driver.maximize_window()
driver.implicitly_wait(10)
driver.get("https://www.reuters.com/companies/AAPL.O")

i=0
try:
    while True:
        news = driver.find_elements_by_xpath("//div[@class='item']")
        driver.execute_script("arguments[0].scrollIntoView(true);", news[i])
        if news[i].find_element_by_tag_name("time").get_attribute("innerText") == "18 days ago":
            break
        print(news[i].find_element_by_tag_name("a").get_attribute("innerText"))
        i += 1
        time.sleep(.5)
except:
    pass

driver.quit()

got this error: HTTPConnectionPool(host='127.0.0.1', port=40007): Max retries exceeded with url: /session/f6de1e58f557727793481d09cf44e2b5/window/maximize (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused')) — huy, Sep 19 '21 at 15:25
@huy - Refer this [Link](https://stackoverflow.com/q/63944480/16452840). I don't think its because of my code. — pmadhu, Sep 19 '21 at 16:04

Scraping news articles only beyond a certain date on Reuters

1 Answers1