2

I am using the Twython library for the first time. It gives me an object 'data' which is of type dictionary. Among the fields of this dictionary is ['user'], which is a sub-dictionary.

I create a list of this dictionary:

tweets=[]
  #Given an object data
  ...some kind of loop....
  tweets.append(data)

Once I have that, I've been converting it to a Data Frame:

output = pd.DataFrame(tweets)

which works fine for the first level of dictionaries, but for the 2nd level of dictionaries it converts them all to strings.

Ideally what I would like to be able to do is something like:

output['user']['screen_name'][1]

instead of

user_info = ast.literal_eval(output['user'][1]))
print user_info['screen_name']

and natively access the data. Currently, I have to use something like ast to convert it to another dictionary first on a row-by-row basis. Is there a more efficient way of doing this?

Henry
  • 1,646
  • 12
  • 28
  • Why are you trying to put this into a DataFrame? – AChampion Sep 15 '15 at 16:47
  • Because I wanted to practise my panda skills a bit, and it seemed logical to put a data set with N number of rows of the same dictionary into a dataframe? Open to suggestions, the DF implementation isnt my ultimate goal, I'm just looking to do some manipulation of data acquired from twitter. – Henry Sep 15 '15 at 16:50
  • I guess there is still a bit of a wider question of how I might be able to deal with nested dictionaries, or is that not suitable at all for a Data Frame? – Henry Sep 15 '15 at 16:54
  • It is not easy to answer without a sample of the data. However have you tried something like described in the answer of [this question](http://stackoverflow.com/questions/15455388/dict-of-dicts-of-dicts-to-dataframe) ? – Romain Sep 15 '15 at 21:01
  • The data is quite lengthy, I'll try to set something up as a demo. I believe I've tried some of the ways presented there and not had any success. Will update tomorrow when I can – Henry Sep 15 '15 at 21:04
  • An example of a paste from this is available here : http://pastebin.com/y6v64Qtw Thats just the first element of my list, but I believe it gives you a good idea of what it looks like. The list is obviously type-def dictionary, and the sub-dictionaries are also correctly type-def'd as dictionaries. – Henry Sep 16 '15 at 10:06
  • I looked at the answer you suggested. For my object 'tweets' (the list of dictionaries), it fails as list does not have attribute items. For operating on an individual dictionary within the list, it fails as arrays must all be the same length. – Henry Sep 16 '15 at 10:15
  • I guess a more simplified version of my question is then this: For `dictone = {'name':'Henry','age':100} dicttwo= {'location':'internet','profile':'blah'} dicttwo['profile']=dictone ` Gives me the behaviour as expected - i.e. sub-nests a dictionary on dictone. But if I then convert this to a DataFrame, it doesnt behave as anticipated: `x = pd.DataFrame(dictone) x now has a row 'age', 'name', with columns 'location' and 'profile'` – Henry Sep 16 '15 at 10:19

1 Answers1

0

I have a possible solution. I've yet to try it over the full data sample of my problem, but I think this may work:

Assuming we have two DataFrame objects:

data_one
data_two

We can manipulate it like this:

data_one['index']=data_one.index
data_two['index']=data_two.index

This creates a new dictionary label for 'index'. Please note that this is massively assuming the rows line up with what you want.

data_three = pd.merge(data_one,data_two)

I haven't fully tested it yet for other reasons, but for my test case it seems to be giving the correct behaviour. I'm sure there's a smoother way of doing this & someone will post a one-liner, but if you're reading this in 2020 and this is the only answer, there's one way to do it!

Henry
  • 1,646
  • 12
  • 28
  • I can confirm that this method is one way of making it work. For some reason I struggled to get the merge working properly, so I've had to do it explicitly via: `output = pd.merge(data_one,data_two,left_on='index',right_on='index')` No idea why, but it was returning an empty merge before and now it returns a merge of the expected proportions. – Henry Sep 16 '15 at 10:25