2

How to unnest (explode) a column in a pandas DataFrame?

I believe this question is not a duplicate of the one listed above. I am trying to find the combination of the cells in a column and create two columns out of this. The above explanation shows how to un-nest a list, but not find the combinations of that list...

I have a dataframe where one of the columns contains a list. I am trying to expand this dataframe so that I can get every combination of the list and still keep the other info. Hard to explain, example dataframes below:

name    number    ID           code
1111      2        3    ['%AB','$12','&FD']

I am trying to figure out how to turn this dataframe into the following:

name    number    ID    to    from
1111      2        3    %AB    $12
1111      2        3    %AB    &FD
1111      2        3    $12    &FD

code I tried:

a = [y for x in df[['code']].stack() for y in combinations(x,2)]
df[['to','from']] = a
MaxB
  • 428
  • 1
  • 8
  • 24

1 Answers1

4

Idea is add indices to tuples for index in new DataFrame (DataFrame.pop is for extract column), so possible DataFrame.join original DataFrame:

#if not default indices, create them
#df = df.reset_index(drop=True)

print (df)
   name  number  ID                    code
0  1111       2   3     ['%AB','$12','&FD']
1  1000       2   3  ['%AB1','$121','&FD1']


a = [(i,) + y for i, x in df.pop('code').items() for y in combinations(x,2)]
df1 = pd.DataFrame(a, columns=['idx','to','to']).set_index('idx')
print (df1)
       to    to
idx            
0     %AB   $12
0     %AB   &FD
0     $12   &FD
1    %AB1  $121
1    %AB1  &FD1
1    $121  &FD1

df2 = df1.join(df).reset_index(drop=True)
print (df2)
     to    to  name  number  ID
0   %AB   $12  1111       2   3
1   %AB   &FD  1111       2   3
2   $12   &FD  1111       2   3
3  %AB1  $121  1000       2   3
4  %AB1  &FD1  1000       2   3
5  $121  &FD1  1000       2   3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252