2

I am scraping a web page (twitter) using "webdriver.PhantomJS".

I'd like to get all the scrolling and data (tweets), but now I only know how to fetch the page.

for _ in range(500):
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(0.2)

With past data instead of real time data (For example, from May 1 to May 2)

The number of data is fixed.

However, I can not figure out how many tweets

I have and it is a problem to set the number of pages.

How do I write code that does infinite scrolling?

I've seen a lot of answers through search, but I have a hard time applying it to my code, so I ask this question.

#My entire code is this.
#py3
import requests
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

browser = webdriver.PhantomJS('C:\phantomjs-2.1.1-windows/bin/phantomjs')
url = u'https://twitter.com/search?f=tweets&vertical=default&q=%EB%B0%B0%EA%B3%A0%ED%8C%8C%20since%3A2017-07-19%20until%3A2017-07-20&l=ko&src=typd&lang=ko'


browser.get(url)
time.sleep(1)

body = browser.find_element_by_tag_name('body')
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

for _ in range(500):
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(0.2)

tweets=browser.find_elements_by_class_name('tweet-text')

wfile = open("ttttttt.txt", mode='w', encoding='utf8')
data={}
i = 1
for i, tweet in enumerate(tweets):
    data['text'] = tweet.text
    print(i, ":", data)
    wfile.write(str(data) +'\n')
    i += 1
wfile.close()
yome
  • 953
  • 2
  • 7
  • 11
  • have you tried using the keyboard values of "pagedown" ? – ExtractTable.com Jul 20 '17 at 17:58
  • @UdayS I **for _ in range (50): Body.send_keys (Keys.PAGE_DOWN)** I used the code but could only get 20 tweets. So I modified it with the code in the question. – yome Jul 20 '17 at 18:01
  • I presume, by default the webdriver will not have the cursor on the body of the page. I think a **'.click()**' on the Body should work. Could you please try click() just before this loop. – ExtractTable.com Jul 20 '17 at 18:08
  • @UdayS It is a little difficult answer for me as a beginner. Can you teach me more details if you are okay? – yome Jul 20 '17 at 18:16
  • This exact same question is asked here: https://stackoverflow.com/questions/28928068/scroll-down-to-bottom-of-infinite-page-with-phantomjs-in-python/28928684#28928684 – Jake Jul 20 '17 at 20:47
  • @Jake I've seen this question a long time ago. But I can not solve it, so I leave this post. I have not been able to apply it to my code. – yome Jul 20 '17 at 20:54

0 Answers0