I have a data frame which has rows for each user joining my site and making a purchase.
+---+-----+--------------------+---------+--------+-----+
| | uid | msg | _time | gender | age |
+---+-----+--------------------+---------+--------+-----+
| 0 | 1 | confirmed_settings | 1/29/15 | M | 37 |
| 1 | 1 | sale | 4/13/15 | M | 37 |
| 2 | 3 | confirmed_settings | 4/19/15 | M | 35 |
| 3 | 4 | confirmed_settings | 2/21/15 | M | 21 |
| 4 | 5 | confirmed_settings | 3/28/15 | M | 18 |
| 5 | 4 | sale | 3/15/15 | M | 21 |
+---+-----+--------------------+---------+--------+-----+
I would like to change the dataframe so that each row is unique for a uid and there is a columns called sale
and confirmed_settings
which have the timestamp of the action. Note that not every user has a sale
, but every user has a confirmed_settings
. Like below:
+---+-----+--------------------+---------+---------+--------+-----+
| | uid | confirmed_settings | sale | _time | gender | age |
+---+-----+--------------------+---------+---------+--------+-----+
| 0 | 1 | 1/29/15 | 4/13/15 | 1/29/15 | M | 37 |
| 1 | 3 | 4/19/15 | null | 4/19/15 | M | 35 |
| 2 | 4 | 2/21/15 | 3/15/15 | 2/21/15 | M | 21 |
| 3 | 5 | 3/28/15 | null | 3/28/15 | M | 18 |
+---+-----+--------------------+---------+---------+--------+-----+
To do this, I am trying:
df1 = df.pivot(index='uid', columns='msg', values='_time').reset_index()
df1 = df1.merge(df[['uid', 'gender', 'age']].drop_duplicates(), on='uid')
But I get this error: ValueError: Index contains duplicate entries, cannot reshape
How can I pivot a df with duplicate index values to transform my dataframe?
Edit:
df1 = df.pivot_table(index='uid', columns='msg', values='_time').reset_index()
gives this error DataError: No numeric types to aggregate
but im not even sure that is the right path to go on.