Python: Import Tweet unicode data to pandas data frame object

Question

I am attempting to import a file that has the structure below (dump of tweets, with unicode strings). The goal is to convert this to a DataFrame using the pandas module. I assume the first step is to load to a json object and then convert to a DataFrame (per p. 166 of McKinney's Python for Data Analysis book) but am unsure and could use some pointers to manage this.

import sys, tailer
tweet_sample = tailer.head(open(r'<MyFilePath>\usTweets0.json'), 3)
tweet_sample # returns
['{u\'contributors\': None, u\'truncated\': False, u\'text\': u\'@KREAYSHAWN is...

I think there is a typo in your sample python output, it's not a proper python object atm. — Andy Hayden, Jun 06 '13 at 16:27
@AndyHayden Thanks for looking this over and yes, I'm still having a tough time converting this 'str' object into something usable. Attempted: `file1 = tweet_sample.encode('utf-8') file2 = json.dumps(file1,encoding='utf-8', separators=(',', ': ')) print file2` "{u'contributors': none, u'truncated': false, u'text':... object is still a string and json.load doesn't facilitate a workable solution. — Arthur Aguirre, Jun 07 '13 at 18:43
Hmmm, if it has u'' in the text file you might be better off with `ast.literal_eval`. Perhaps you could link to the actual json? (If it's a string use `json.loads` (!).) — Andy Hayden, Jun 07 '13 at 21:31
@AndyHayden Hugely helpful, been working at this and checked in again and that piece of code did it. Much appreciated. — Arthur Aguirre, Jun 07 '13 at 21:49

score 2 · Accepted Answer · edited May 23 '17 at 12:02

Just use the DataFrame constructor...

In [6]: tweet_sample = [{'contributers': None, 'truncated': False, 'text': 'foo'}, {'contributers': None, 'truncated': True, 'text': 'bar'}]

In [7]: df = pd.DataFrame(tweet_sample)

In [8]: df
Out[8]:
  contributers text truncated
0         None  foo     False
1         None  bar      True

If you have the file as a JSON you can open it using json.load:

import json
with open('<MyFilePath>\usTweets0.json', 'r') as f:
    tweet_sample = json.load(f)

There will be a from_json coming soon to pandas...

Python: Import Tweet unicode data to pandas data frame object

1 Answers1