I am trying to go back as far as I can in the tweets history of a twitter account (technical blogger account that I would like to read since its inception).
For that I have two options:
-buy access to Search APIs from Twitter (NO!!)
-use Selenium and scroll down through the tweets of that account and collect the messages in a file, read them later
I did read this StaleElementReference Exception in PageFactory
Below is the code. My issue is that I get a StaleElementReference
Exception which I understand that it is due to page changes (refresh).
Since I am scrolling down I am not sure how I can prevent that from happening. Any suggestions on how I can improve the code while still achieving what I want ?
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Chrome('c:/Utils/ChromeDriver/chromedriver.exe')
driver.get("https://twitter.com/realpython/with_replies")
driver.implicitly_wait(0)
time.sleep(10) #wait for the chrome window to show up
SCROLL_PAUSE_TIME = 1.5
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
tweets=[]
tweets_file=open("tweets.txt",'a',encoding="utf-8")
while True:
# Scroll down to bottom
if i==0:
SCROLL_PAUSE_TIME = 3 # give it more time in the first iteration
else:
SCROLL_PAUSE_TIME = 1
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
elements=driver.find_elements_by_tag_name("article")
for element in elements:
tweets_file.write(element.text)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
tweets_file.close()