4

I am using tweepy to capture twitter data, I would like to know if I have how to export the tweets to a json, txt or csv file? My code:

#coding = utf-8

import json
import tweepy
from tweepy import OAuthHandler
from tweepy import Stream
from tweepy.streaming import StreamListener

consumer_key = "my_consumer_key"
consumer_secret = "my_consumer_secret"
access_token = "my_acess_token"
access_token_secret = "my_acess_token_secret"

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

def saida_json(tweet):
    with open('tweet.json', 'a', encoding='utf-8') as f:
        json.dump(tweet, f)

def saida_txt(tweet):
    with open('tweet.txt', 'a', encoding='utf-8') as f:
        for linha in tweet:
            f.write(tweet + '\n')

name = "usersl"
tweetCount = 20
public_tweets = api.home_timeline()
user_tweets = api.user_timeline(id=name, count=tweetCount)

for tweet in user_tweets:
    print(tweet.user.screen_name, tweet.text)
    saida_txt(tweet.text)
    saida_json(tweet)

I have tried to do it through functions, but every time I run into errors. In the txt file, it only writes the first tweet and json, informs that "its not serelized". Where is my error guys?

vic.py
  • 409
  • 10
  • 22

1 Answers1

2

If you try to write your tweet to a JSON file, json.dump will attempt to convert it to the JSON format. This process is called serialization. json.dump only supports a small set of types in the default Encoder, which you can read about in the Python documentation. Since the class that tweeps uses to represent a Tweet is not part of these types, the json module raises the exception you mentioned.

As a solution, you could serialize a dictionary containing various data about the tweet, here's an example:

def tweet_to_json(tweet):
    tweet_dict = {
        "text": tweet.text,
        "author_name": tweet.user.screen_name
    }
    with open('tweet.json', 'w+') as f:
        json.dump(tweet_dict, f)

Note that using append mode with JSON files is usually not a good idea. You could use a JSON list instead. This reply to another question might help you with this.

Edit: Here's an example for saving a JSON list:

result = []
for tweet in api.user_timeline(id=name, count=tweetCount):
    result.append({
        'text': tweet.text, 
        'author_name': tweet.user.screen_name
    })
with open('tweet.json', 'w+') as f:
    json.dump(result, f)
Johannes Christ
  • 123
  • 2
  • 9
  • Thank you Volcyy, the only problem was the "w +" parameter. He still kept transcribing just one line. However, I changed to "a +" and got some of the answer. Volcoyy, can you tell me how I can resolve that the output was in 'utf-8'? Even passing within the function encoding = 'utf-8', the answer is still the same. – vic.py Sep 13 '17 at 07:41
  • I think you are mistaking encoding for something different here. You try to use append mode (the `a` flag in `open`) for adding new data to the JSON file, but that will mess up the JSON file's formatting, so I don't think that the `encoding` flag is the problem here. You could use a JSON list like in the answer I linked. I added an example to my answer. – Johannes Christ Sep 13 '17 at 10:24