0

I am scraping a Twitter page using Selenium and my scraped tweets are stored in a list variable tweets. I can iterate through them normally and extract the text from them using:

for tweet in tweets:
    print(tweet.text)

However, when I try to use list comprehension and do

[tweet.text for tweet in tweets]

I get a StaleElementReferenceException

StaleElementReferenceException: Message: The element reference of [object String] "b22c079f-684f-4d46-942b-d5dd69203728" is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed

Why is this happening?

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
wrahool
  • 1,101
  • 4
  • 18
  • 42
  • `[tweet.text for tweet in tweets]` what action you are doing in this loop. Seems your DOM get reload. see description _it is not in the current frame context, or the document has been refreshed_ – NarendraR Sep 24 '20 at 05:28
  • no action at all. the for loop works, the list comprehension doesn't. – wrahool Sep 24 '20 at 05:34
  • @wrahool you probably updated/refreshed browser content in meantime – Zaraki Kenpachi Sep 24 '20 at 08:16

2 Answers2

1

The state of element has been changed when you are trying list comprehension. So get the tweets elements before list comprehension, like below.

tweets = driver.find_elements_by_xpath('YOUR_XPATH_HERE')
tweets_lists = [tweet.text for tweet in tweets]
Maran Sowthri
  • 829
  • 6
  • 14
1

A lot depends on how you attempt to construct the tweets.

Ideally, to extract the texts from all of the tweets using Selenium and you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR and get_attribute("innerHTML"):

    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "css_selector_of_tweets")))])
    
  • Using XPATH and text attribute:

    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "xpath_of_tweets")))])
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

Outro

Link to useful documentation:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352