I am using Tweepy to stream tweets and would like to record them in a CSV format so I can play around with them or load them in database later. Please keep in mind that I am a noob, but I do realize there are multiple ways of handling this (suggestions are very welcome).
Long story short, I need to convert and append multiple Python dictionaries to a CSV file. I already did my research (How do I write a Python dictionary to a csv file?) and tried doing this with DictWriter and writer methods.
However, there are few more things that need to be accomplished:
1) Write key as header only once.
2) As new tweet is streamed, value needs to be appended without overwriting previous rows.
3) If value is missing record NULL.
4) Skip/fix ascii codec errors.
Here is the format of what I would like to end up with (each value is in its individual cell):
Header1_Key_1 Header2_Key_2 Header3_Key_3...
Row1_Value_1 Row1_Value_2 Row1_Value_3...
Row2_Value_1 Row2_Value_2 Row2_Value_3...
Row3_Value_1 Row3_Value_2 Row3_Value_3...
Row4_Value_1 Row4_Value_2 Row4_Value_3...
Here is my code:
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import csv
import json
consumer_key="XXXX"
consumer_secret="XXXX"
access_token="XXXX"
access_token_secret="XXXX"
class StdOutListener(StreamListener):
def on_data(self, data):
json_data = json.loads(data)
data_header = json_data.keys()
data_row = json_data.values()
try:
with open('csv_tweet3.csv', 'wb') as f:
w = csv.DictWriter(f, data_header)
w.writeheader(data_header)
w.writerow(json_data)
except BaseException, e:
print 'Something is wrong', str(e)
return True
def on_error(self, status):
print status
if __name__ == '__main__':
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
stream.filter(track=['world cup'])
Thank you in advance!