Grabbing data using selenium and adding it to a dictionary for use in a dataframe

Question

I have been trying to grab tweets off of twitter using selenium. I have been successful at getting the html that I want and printing it, but I am having trouble with getting into a form that is appropriate to use for a dataframe.

Here is my code:

import time
import pandas as pd
import numpy as np

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By

browser = webdriver.Chrome()
url = 'https://twitter.com/search?f=tweets&q=cuomosmta%20since%3A2016-08-22%20until%3A2018-08-22'

browser.get(url)
time.sleep(1)

tweet_dict = {}

tweets = browser.find_elements_by_class_name('tweet-text')

for tweet in tweets:
    print(tweet.text)
    tweet_dict['tweet'] = tweet.text

If you run the code, you will see that it prints each individual tweet. I did this to ensure that the code was working.

But for some reason, when I check my dictionary, my output from:

tweet_dic['tweet']

is:

'Ugh, Cuomo and #CuomosMTA are terrible, just terrible.'

The output above is also the last tweet on the page that I am tyring to scrape.

I have tried this method multiple ways and even tried BeautifulSoup, but for some reason I keep getting the same result.

I don't understand why I am able to print all of the tweets but not append them to dictionary.

I am a beginner and am probably missing something very obvious so any help would be appreciated.

Please, if possible, I am trying to keep only using selenium since it is easier to use to grab the exact timestamp than it is in beautifulsoup.

Thank you!

score 1 · Accepted Answer · answered Aug 30 '18 at 05:03

1

Dictionary should contain unique keys only, so instead of appending each tweet in a loop, you're just overwriting the same key-value pair. You can try below solution:

for tweet in range(len(tweets)):
    print(tweets[tweet].text)
    tweet_dict['tweet_%s' % tweet] = tweets[tweet].text

The output should be as

{'tweet_0': 'first tweet content', 'tweet_1': 'second tweet content', ...}

answered Aug 30 '18 at 05:03

Andersson

51,635
17
77
129

Thank you so, so much. My only question is what ['tweet_%s' % tweet] actually means. If you could walk me through that I would really appreciate it. – jmoore00 Aug 30 '18 at 16:30
This is (one of the ways) how string concatenation work in Python: `%s` is a string placeholder in mean that it will be replaced with actual string value. `% tweet` is what exactly you passed instead of placeholder. You can also pass more subsrtrings as `"here comes %s and %s" % ("first", "second")`, so that it will be executed as `"fere comes first and second"` – Andersson Aug 30 '18 at 17:06
@agra94 you can [accept](https://stackoverflow.com/help/accepted-answer)/[upvote](https://stackoverflow.com/help/privileges/vote-up) the answer in case it solved your issue/was useful as well as one of answers to your [previous question](https://stackoverflow.com/questions/52084458/selenium-error-message-selenium-webdriver-has-no-attribute-execute-script) – Andersson Aug 30 '18 at 21:03
Of course, sorry. I wasn't able to upvote until today because I didn't have enough reputation. – jmoore00 Aug 31 '18 at 16:14

Grabbing data using selenium and adding it to a dictionary for use in a dataframe

1 Answers1