0

I made a loop to capture unique tweets using the following code below.

engine = Twitter(language='en')
idindex = set()
tweets = []

prev = None

#create loop to capture 200 unique tweets for 'Winter snow storm'
for i in range(5):
    print(i)
    for tweet in engine.search('Winter snow storm', start=prev, count= 200, cached=False):
        print(f'ID = {tweet.id}, hashtag = {hashtags(tweet.text)}, text = {tweet.text}, author = {tweet.author}, date = {tweet.date}')
        if len(tweet.text) > 0 and tweet.id not in idindex:
            tweets.append(tweet.text)
            idindex.add(tweet.id)
        prev = tweet.id
print(f'Found {len(tweets)} tweets.')
print('')

Output for the first few rows

0
ID = 1411779493195182080, hashtag = ['#Siveen', '#AgniSiragugal'], text = RT @NaveenFilmmaker: I’m a proud father today. My daughter #Siveen completes dubbing for #AgniSiragugal. For a 5yr old kid she boldly performed in that harsh European winter and snow storm. Today she made me cry when she dubbed an emotional scene. சிவீன் தந்தை நவீன் என்று சொல்லிக்கொள்வதில் பெருமை!!, author = YathumOorey, date = Sun Jul 04 20:10:35 +0000 2021 
ID = 1411770366570106880, hashtag = ['#Siveen', '#AgniSiragugal'], text = RT @NaveenFilmmaker: I’m a proud father today. My daughter #Siveen completes dubbing for #AgniSiragugal. For a 5yr old kid she boldly performed in that harsh European winter and snow storm. Today she made me cry when she dubbed an emotional scene. சிவீன் தந்தை நவீன் என்று சொல்லிக்கொள்வதில் பெருமை!!, author = itzabiiz, date = Sun Jul 04 19:34:19 +0000 2021 
ID = 1411769785554063360, hashtag = ['#Siveen', '#AgniSiragugal'], text = RT @NaveenFilmmaker: I’m a proud father today. My daughter #Siveen completes dubbing for #AgniSiragugal. For a 5yr old kid she boldly performed in that harsh European winter and snow storm. Today she made me cry when she dubbed an emotional scene. சிவீன் தந்தை நவீன் என்று சொல்லிக்கொள்வதில் பெருமை!!, author = kalaimaniraj1, date = Sun Jul 04 19:32:00 +0000 2021
...

Is there a a way store the output from above into a pandas dataframe? I want the dataframe to contain the following: ID = {tweet.id}, hashtag = {hashtags(tweet.text)}, text = {tweet.text}, author = {tweet.author}, date = {tweet.date}.

  • It's hard to help when we don't have your data. As explained [here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples), the best way to get help on a pandas issue is to give us an easy way to copy/paste a sample of your data, or some simple code that re-creates the sample. And then also make sure that we can run the code you post. – joao Jul 04 '21 at 22:02
  • Hello, the data is pulled directly from Twitter (web minning of Twitter). And the code to pull the data is listed in the question. – smanrriquez Jul 04 '21 at 22:47

1 Answers1

0

This should do that job:

import pandas as pd

engine = Twitter(language='en')
idindex = set()
tweets = []
df= pd.DataFrame()

prev = None

#create loop to capture 200 unique tweets for 'Winter snow storm'
for i in range(5):
    print(i)
    for tweet in engine.search('Winter snow storm', start=prev, count= 200, cached=False):
        print(f'ID = {tweet.id}, hashtag = {hashtags(tweet.text)}, text = {tweet.text}, author = {tweet.author}, date = {tweet.date}')

        df.append({'ID':tweet.id, 'hashtag':hashtags(tweet.text), 'text':tweet.text,'author' : tweet.author, 'date' : tweet.date})

        if len(tweet.text) > 0 and tweet.id not in idindex:
            tweets.append(tweet.text)
            idindex.add(tweet.id)
        prev = tweet.id
print(f'Found {len(tweets)} tweets.')
print('')
paradocslover
  • 2,932
  • 3
  • 18
  • 44