0

I need all the replies/comments of a tweet. The related question has an answer which requires to download too much data and then discard them after cross matching, and it is not possible for me due to the rate limitations. I tried to scrape the page by first loading the tweet url using python. To scroll the page, I tried to use selenium web driver. But I still get only replies in the first page. For some reason, scrolling is not working. I tried these 1,2,3, 4 approaches, but none worked in this case.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

driver = webdriver.Firefox()
driver.get("https://twitter.com/neiltyson/status/912299342559694848")

for in xrange(10):
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    print('height:{}'.format(driver.execute_script("return document.body.scrollHeight")))
    time.sleep(3)

I noticed that, the height does not change after the first iteration.

Rakib
  • 7,435
  • 7
  • 29
  • 45
  • Is it paginated, or infinite scrolling? Can you share a Minimal, Complete, and Verifiable example of your specific issue? – ivan7707 Sep 26 '17 at 18:59
  • @ivan7707, I tried to scroll infinitely as suggested in another SO question, but it never completes! with different number of repetition for scrolling, I get the same result. I guess scrolling is not working, since the comments are loaded in another body of the page? – Rakib Sep 26 '17 at 19:05
  • thanks for updating the question. See below answer. – ivan7707 Sep 26 '17 at 19:25

1 Answers1

1

I have Python3 running right now, so I changed xrange to range to test it out.

Try this (works for me):

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

driver = webdriver.Firefox()
driver.get("https://twitter.com/neiltyson/status/912299342559694848")

page = driver.find_element_by_tag_name('body')

for i in range(10):
    page.send_keys(Keys.PAGE_DOWN)
    time.sleep(3)
ivan7707
  • 1,146
  • 1
  • 13
  • 24
  • Thanks @ivan7707. It does can scroll. But after couple of time, twitter stops serving next replies. If I save the page as html file and open in a browser, I see the error message "Loading seems to be taking a while. Twitter may be over capacity or experiencing a momentary hiccup.". How can I detect it? and when it happens, try again? I tried to increase sleep time, but did not help. – Rakib Sep 26 '17 at 19:58
  • That is a different question altogether [and a twitter issue it seems](https://twittercommunity.com/t/does-anybody-keep-getting-the-error-message-loading-tweets-seems-to-be-taking-a-while/8452). The above code scrolls for the amount of times input in range (answering your initial question) for me and has no issues. – ivan7707 Sep 26 '17 at 20:03
  • 1
    The question is also down voted. I guess its fun for some people to down vote. – Rakib Sep 27 '17 at 13:09
  • 1
    I agree. Some people just down vote for nothing. I found the question and the suggestions very helpful, having to deal with this myself. Thanks to all who took the time to respond and offer suggestions. There would always be those who troll or those who are ungrateful. – R J Apr 14 '18 at 02:36