0

I want to extract news articles from Link As you keep scrolling down, older articles keep appearing. But I only want information of last 1 year. How can I set that filter?

cruisepandey
  • 28,520
  • 6
  • 20
  • 38
huy
  • 176
  • 2
  • 13

1 Answers1

2

Try like this.

The below code scrolls till it finds 18 days ago. Change the condition to a year ago and the loop will break when it finds the news which was a year ago.

from selenium import webdriver
import time

driver = webdriver.Chrome(executable_path="path to chromedriver.exe")
driver.maximize_window()
driver.implicitly_wait(10)
driver.get("https://www.reuters.com/companies/AAPL.O")

i=0
try:
    while True:
        news = driver.find_elements_by_xpath("//div[@class='item']")
        driver.execute_script("arguments[0].scrollIntoView(true);", news[i])
        if news[i].find_element_by_tag_name("time").get_attribute("innerText") == "18 days ago":
            break
        print(news[i].find_element_by_tag_name("a").get_attribute("innerText"))
        i += 1
        time.sleep(.5)
except:
    pass

driver.quit()
pmadhu
  • 3,373
  • 2
  • 11
  • 23
  • got this error: HTTPConnectionPool(host='127.0.0.1', port=40007): Max retries exceeded with url: /session/f6de1e58f557727793481d09cf44e2b5/window/maximize (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused')) – huy Sep 19 '21 at 15:25
  • @huy - Refer this [Link](https://stackoverflow.com/q/63944480/16452840). I don't think its because of my code. – pmadhu Sep 19 '21 at 16:04