pivoting dataframe with two columns of lists

Question

I have a Dataframe like:

matrix = [(222, ['A','B','C'], [1,2,3]),
         (333, ['A','B','D'], [1,3,5])]

df = pd.DataFrame(matrix, columns=['timestamp', 'variable', 'value'])

timestamp     variable         value   

222           ['A','B','C']    [1,2,3]
333           ['A','B','D']    [1,3,5]

and would like to pivot it so that the timestamp value is kept, the unique values in the variable column become additional columns, and values from value are sorted in the respective columns.

The output should look as follows:

timestamp   A    B    C    D 

222         1    2    3    nan
333         1    3    nan  5

any help would be greatly appreciated! :)

jezrael · Answer 1 · 2019-05-16T14:32:33.193

Create dictionary with zip, pass to DataFrame constructor:

a = [dict(zip(*x)) for x in zip(df['variable'], df['value'])]
print (a)
[{'A': 1, 'B': 2, 'C': 3}, {'A': 1, 'B': 3, 'D': 5}]

df = df[['timestamp']].join(pd.DataFrame(a, index=df.index))
print (df)
   timestamp  A  B    C    D
0        222  1  2  3.0  NaN
1        333  1  3  NaN  5.0

If many another columns use DataFrame.pop for extract columns:

a = [dict(zip(*x)) for x in zip(df.pop('variable'), df.pop('value'))]

df = df.join(pd.DataFrame(a, index=df.index))
print (df)
   timestamp  A  B    C    D
0        222  1  2  3.0  NaN
1        333  1  3  NaN  5.0

boom - there it is! Thanks! – akleefel May 16 '19 at 14:48 — akleefel, May 16 '19 at 14:48

score 3 · Accepted Answer · answered May 16 '19 at 14:31

Using unnest first , then just pivot

unnesting(df,['variable','value']).pivot(*df.columns)
Out[79]: 
variable     A    B    C    D
timestamp                    
222        1.0  2.0  3.0  NaN
333        1.0  3.0  NaN  5.0

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')

score 2 · Answer 3 · answered May 16 '19 at 14:30

You can pass the values & column names to a the pd.Series constructor. This will automatically expand the values in your desired shape.

df.set_index('timestamp').apply(lambda row: pd.Series(row.value, index=row.variable), axis=1)

# outputs:
             A    B    C    D
timestamp
222        1.0  2.0  3.0  NaN
333        1.0  3.0  NaN  5.0

pivoting dataframe with two columns of lists

3 Answers3