4

I am attempting to import a file that has the structure below (dump of tweets, with unicode strings). The goal is to convert this to a DataFrame using the pandas module. I assume the first step is to load to a json object and then convert to a DataFrame (per p. 166 of McKinney's Python for Data Analysis book) but am unsure and could use some pointers to manage this.

import sys, tailer
tweet_sample = tailer.head(open(r'<MyFilePath>\usTweets0.json'), 3)
tweet_sample # returns
['{u\'contributors\': None, u\'truncated\': False, u\'text\': u\'@KREAYSHAWN is...
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
Arthur Aguirre
  • 83
  • 1
  • 1
  • 6
  • I think there is a typo in your sample python output, it's not a proper python object atm. – Andy Hayden Jun 06 '13 at 16:27
  • @AndyHayden Thanks for looking this over and yes, I'm still having a tough time converting this 'str' object into something usable. Attempted: `file1 = tweet_sample.encode('utf-8') file2 = json.dumps(file1,encoding='utf-8', separators=(',', ': ')) print file2` "{u'contributors': none, u'truncated': false, u'text':... object is still a string and json.load doesn't facilitate a workable solution. – Arthur Aguirre Jun 07 '13 at 18:43
  • Hmmm, if it has u'' in the text file you might be better off with `ast.literal_eval`. Perhaps you could link to the actual json? (If it's a string use `json.loads` (!).) – Andy Hayden Jun 07 '13 at 21:31
  • @AndyHayden Hugely helpful, been working at this and checked in again and that piece of code did it. Much appreciated. – Arthur Aguirre Jun 07 '13 at 21:49

1 Answers1

2

Just use the DataFrame constructor...

In [6]: tweet_sample = [{'contributers': None, 'truncated': False, 'text': 'foo'}, {'contributers': None, 'truncated': True, 'text': 'bar'}]

In [7]: df = pd.DataFrame(tweet_sample)

In [8]: df
Out[8]:
  contributers text truncated
0         None  foo     False
1         None  bar      True

If you have the file as a JSON you can open it using json.load:

import json
with open('<MyFilePath>\usTweets0.json', 'r') as f:
    tweet_sample = json.load(f)

There will be a from_json coming soon to pandas...

Community
  • 1
  • 1
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535