I want to extract news articles from Link As you keep scrolling down, older articles keep appearing. But I only want information of last 1 year. How can I set that filter?
Asked
Active
Viewed 218 times
0
-
2What have you tried so far? Post the same in the question. – pmadhu Sep 19 '21 at 13:23
-
@pmadhu I cannot think of any way to approach this – huy Sep 19 '21 at 14:39
1 Answers
2
Try like this.
The below code scrolls till it finds 18 days ago
. Change the condition to a year ago
and the loop will break when it finds the news which was a year ago.
from selenium import webdriver
import time
driver = webdriver.Chrome(executable_path="path to chromedriver.exe")
driver.maximize_window()
driver.implicitly_wait(10)
driver.get("https://www.reuters.com/companies/AAPL.O")
i=0
try:
while True:
news = driver.find_elements_by_xpath("//div[@class='item']")
driver.execute_script("arguments[0].scrollIntoView(true);", news[i])
if news[i].find_element_by_tag_name("time").get_attribute("innerText") == "18 days ago":
break
print(news[i].find_element_by_tag_name("a").get_attribute("innerText"))
i += 1
time.sleep(.5)
except:
pass
driver.quit()

pmadhu
- 3,373
- 2
- 11
- 23
-
got this error: HTTPConnectionPool(host='127.0.0.1', port=40007): Max retries exceeded with url: /session/f6de1e58f557727793481d09cf44e2b5/window/maximize (Caused by NewConnectionError('
: Failed to establish a new connection: [Errno 111] Connection refused')) – huy Sep 19 '21 at 15:25 -
@huy - Refer this [Link](https://stackoverflow.com/q/63944480/16452840). I don't think its because of my code. – pmadhu Sep 19 '21 at 16:04