4

I have a Dataframe like:

matrix = [(222, ['A','B','C'], [1,2,3]),
         (333, ['A','B','D'], [1,3,5])]

df = pd.DataFrame(matrix, columns=['timestamp', 'variable', 'value'])
timestamp     variable         value   

222           ['A','B','C']    [1,2,3]
333           ['A','B','D']    [1,3,5]

and would like to pivot it so that the timestamp value is kept, the unique values in the variable column become additional columns, and values from value are sorted in the respective columns.

The output should look as follows:

timestamp   A    B    C    D 

222         1    2    3    nan
333         1    3    nan  5 

any help would be greatly appreciated! :)

akleefel
  • 87
  • 5

3 Answers3

3

Create dictionary with zip, pass to DataFrame constructor:

a = [dict(zip(*x)) for x in zip(df['variable'], df['value'])]
print (a)
[{'A': 1, 'B': 2, 'C': 3}, {'A': 1, 'B': 3, 'D': 5}]

df = df[['timestamp']].join(pd.DataFrame(a, index=df.index))
print (df)
   timestamp  A  B    C    D
0        222  1  2  3.0  NaN
1        333  1  3  NaN  5.0

If many another columns use DataFrame.pop for extract columns:

a = [dict(zip(*x)) for x in zip(df.pop('variable'), df.pop('value'))]

df = df.join(pd.DataFrame(a, index=df.index))
print (df)
   timestamp  A  B    C    D
0        222  1  2  3.0  NaN
1        333  1  3  NaN  5.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
3

Using unnest first , then just pivot

unnesting(df,['variable','value']).pivot(*df.columns)
Out[79]: 
variable     A    B    C    D
timestamp                    
222        1.0  2.0  3.0  NaN
333        1.0  3.0  NaN  5.0

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')
BENY
  • 317,841
  • 20
  • 164
  • 234
2

You can pass the values & column names to a the pd.Series constructor. This will automatically expand the values in your desired shape.

df.set_index('timestamp').apply(lambda row: pd.Series(row.value, index=row.variable), axis=1)

# outputs:
             A    B    C    D
timestamp
222        1.0  2.0  3.0  NaN
333        1.0  3.0  NaN  5.0
Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85