Pandas - transpose lists with unequal length in the value of dataframe

Question

This question is an extension of this question Pandas: split list in column into multiple rows, now this time I wan't to merge more DataFrames. And I couldn't get it to work with more than 2 dfs.

I have this DataFrame:

  Index     Job positions   Job types   Locations
      0          [5]         [6]        [3, 4, 5]
      1          [1]         [2, 6]     [3, NaN] 
      2          [1,3]       [9, 43]    [1]

I would like every single combination of numbers, so the final result would be:

index   Job position  Job type  Location
    0   5             6         3
    0   5             6         4
    0   5             6         5
    1   1             2         3
    1   1             2         NaN
    1   1             6         3
    1   1             6         NaN
    2   1             9         1
    2   1             43        1
    2   3             9         1
    2   3             43        1

So what I've done is to transform the columns into Series:

positions = df['Job positions'].apply(pd.Series).reset_index().melt(id_vars='index').dropna()[['index', 'value']].set_index('index')
types = df['Job types'].apply(pd.Series).reset_index().melt(id_vars='index').dropna()[['index', 'value']].set_index('index')
locations = df['Locations'].apply(pd.Series).reset_index().melt(id_vars='index').dropna()[['index', 'value']].set_index('index')

dfs = [positions, types, locations]

And then trying to merge them like this:

df_final = reduce(lambda left,right: pd.merge(left,right,left_index=True, right_index=True, how="left"), dfs)

But it seems that is skips the fields with NaN - how do I prevent that?

Ken T · Accepted Answer · 2018-05-12T14:58:48.767

1

1 line:

import itertools

dfres = pd.DataFrame([(i[0],)+j for i in df.values for j in itertools.product(*i[1:])]
        ,columns=df.columns).set_index('index')


       Job positions  Job types  Locations
index                                     
0                  5          6        3
0                  5          6        4
0                  5          6        5
1                  1          2        3
1                  1          2        NaN
1                  1          6        3
1                  1          6        NaN
2                  1          9        1
2                  1         43        1
2                  3          9        1
2                  3         43        1

edited May 12 '18 at 14:58

answered May 12 '18 at 14:47

Ken T

2,255
1
23
30

Cool, I think it works @Ken T, the only thing is that the first column of my dataframe is a list, os it has the format [23] all the way down - do you know why? – Mathias Lund May 12 '18 at 18:23
you mean the column 'index' or 'job positions'? – Ken T May 12 '18 at 20:01
I mean job positions sorry – Mathias Lund May 13 '18 at 15:08
@MathiasLund My best guess is that that entry is not a list but a string i.e. '[23]' – Ken T May 14 '18 at 00:52

Pandas - transpose lists with unequal length in the value of dataframe

1 Answers1