0

I have data in one format (summing up events' occurrences by UUID, events in separate rows) but would need it in a different format (events' occurrences in separate columns for each event). How to, by using Python, transform it from:

UUid Event sum
B32E7AA5B11A40AE page XY opened 121
B32E7AA5B11A40AE shopcart filled 92
B32E7AA5B11A40AE purchase made 23
2AEFB3AEA64140BE page XY opened 221
2AEFB3AEA64140BE shopcart filled 21
A0DFAACDEB4D40C1 page XY opened 1345
A0DFAACDEB4D40C1 shopcart filled 1323
A0DFAACDEB4D40C1 purchase made 1321

To this:

UUid page XY opened shopcart filled purchase made
B32E7AA5B11A40AE 121 92 23
2AEFB3AEA64140BE 221 23 0
A0DFAACDEB4D40C1 1345 1323 1321

I don't know where to start.

  • 2
    use [pivot](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html) – Tranbi Feb 23 '23 at 14:08
  • Create a new DF with the columns you want. Iterate your original df and add the data for one UID to your new df. – SickerDude43 Feb 23 '23 at 14:10

2 Answers2

2

Try using pivot() and fillna()

df_pivot = df.pivot(index='UUid', columns='Event', values='sum').fillna(0)
df_pivot = df_pivot.reset_index()

df_pivot

Output:

UUid page XY opened purchase made shopcart filled
0 2AEFB3AEA64140BE 221 0 21
1 A0DFAACDEB4D40C1 1345 1321 1323
2 B32E7AA5B11A40AE 121 23 92
Pedro Rocha
  • 1,373
  • 1
  • 3
  • 14
1

df = pd.pivot_table(df, values='sum', index='UUid', columns='Event', fill_value=0)

df = df.reset_index()