I have been trying to grab tweets off of twitter using selenium. I have been successful at getting the html that I want and printing it, but I am having trouble with getting into a form that is appropriate to use for a dataframe.
Here is my code:
import time
import pandas as pd
import numpy as np
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
browser = webdriver.Chrome()
url = 'https://twitter.com/search?f=tweets&q=cuomosmta%20since%3A2016-08-22%20until%3A2018-08-22'
browser.get(url)
time.sleep(1)
tweet_dict = {}
tweets = browser.find_elements_by_class_name('tweet-text')
for tweet in tweets:
print(tweet.text)
tweet_dict['tweet'] = tweet.text
If you run the code, you will see that it prints each individual tweet. I did this to ensure that the code was working.
But for some reason, when I check my dictionary, my output from:
tweet_dic['tweet']
is:
'Ugh, Cuomo and #CuomosMTA are terrible, just terrible.'
The output above is also the last tweet on the page that I am tyring to scrape.
I have tried this method multiple ways and even tried BeautifulSoup, but for some reason I keep getting the same result.
I don't understand why I am able to print all of the tweets but not append them to dictionary.
I am a beginner and am probably missing something very obvious so any help would be appreciated.
Please, if possible, I am trying to keep only using selenium since it is easier to use to grab the exact timestamp than it is in beautifulsoup.
Thank you!