1

I have a pandas dataframe where one column is a tuple with a nested tuple. The nested tuple has two existing ids. I want to explode every element in the total tuple into new appended columns. Here's my df so far:

df
  id1  id2   tuple_of_tuple
0 a    e    ('cat',100,('a','f'))
1 b    f    ('dog',100,('b','g'))
2 c    g    ('cow',100,('d','h'))
3 d    h    ('tree',100,('c','e'))

I was trying to implement the code below on a small subset of data, and it seemed to work. There were new appended columns with each extracted/exploded element where it needed to be.

df[['Link_1', 'Link_2','Link_3','Link_4']] = df['tuple_of_tuple'].apply(pd.Series)

But when I apply it on the entire dataset, I get the error "ValueError: Columns must be same length as key". (I should mention that there are a couple NaN's littered around, as in an entire entry in the row for the tuple_of_tuple column will just be NaN). How can I fix this?

guru
  • 173
  • 13

1 Answers1

4

Here's an extremely elegant way to do it using python3.6's * unpacking operator:

df2 = pd.DataFrame(
    data=[[*i, *j] for *i, j in df.pop('tuple_of_tuple')], 
    columns=['link_1', 'link_2', 'link_3', 'link_4']
)

You can then link df2 with df using pd.concat:

pd.concat([df, df2], axis=1)

  id1 id2 link_1  link_2 link_3 link_4
0   a   e    cat     100      a      f
1   b   f    dog     100      b      g
2   c   g    cow     100      d      h
3   d   h   tree     100      c      e
cs95
  • 379,657
  • 97
  • 704
  • 746
  • I got the error "TypeError: 'float' object is not iterable" when running the first code – guru Apr 30 '18 at 20:00
  • @guru Are there some rows without any tuples? This code assumes the last item in the row is a tuple, based on your data provided – cs95 Apr 30 '18 at 20:00
  • @COLDSPEED Yes some rows contain just NaNs, as in no tuples at all in that specific row for the tuple_of_tuples column – guru Apr 30 '18 at 20:02
  • @guru Okay, one more try: `[[*i, *j] if not pd.isnull(j) else j for *i, j in df['tuple_of_tuple']]` – cs95 Apr 30 '18 at 20:07
  • @COLDSPEED sorry still getting the same error from before "TypeError: 'float' object is not iterable" – guru Apr 30 '18 at 20:09
  • @guru It worked for me :( Any chance you can drop these rows beforehand? Using `df = df.drop(subset=['tuple_of_tuple'])` and then run this code? – cs95 Apr 30 '18 at 20:14
  • @guru Can you please provide me with some data that reproduces your problem? I can't give you a working solution otherwise. Please help me out here, thanks – cs95 Apr 30 '18 at 21:52
  • my problem was that i am reading a csv file with tuples in the column, but the pandas csv reader is not recognizing the tuples as tuples, rather as string. I tried using ast.literal_eval to retain the tuple element but ast is not recognized for some reason – guru Apr 30 '18 at 22:30
  • @guru Take a look at my answer here: https://stackoverflow.com/a/48008192/4909087 It might help. – cs95 Apr 30 '18 at 22:32
  • @guru Yes, but I am not currently at my desktop and will be delayed replying to messages. Can you wait a couple of hours? I will promptly get back to you in chat after that. – cs95 Apr 30 '18 at 22:46
  • 1
    @COLDSPEED i figured out my issue. i was incorrectly implementing the dropna code, which brought so many issues. implementing it correctly allowed for successful implementation of the ast.literal_eval code and also for tuple unpacking. Thanks a lot! – guru Apr 30 '18 at 22:54