The data looks like this:
origin_id type serialn event year month day
1 A 101 X1 2017 6 10
1 A 101 X2 2017 6 10
1 B 101 X3 2017 6 10
2 A 151 X1 2016 7 15
2 B 151 X3 2016 7 15
2 C 151 X4 2016 7 15
And I need it to be this way:
origin_id serialn X1 X2 X3 X4 year month day
1 101 A A B null 2017 6 10
2 151 A null B C 2016 7 15
So basically what I need is to use the values of the column event
as headers and put the value of the column type
for each event, when there is no event for a certain origin_id
put a null. Some other columns from the dataframe, like serialn
and origin_id
should be in the resulting one. Also there should be only one row for each origin_id
This question: How to pivot a dataframe touches on some points, altough it's geared towards performing an aggregation at some point.
This is a possible solution.
I get a df with the origin_id as the index, the events as columns and the types as their value.
stat = df.pivot(values='type', index='origin_id', columns='event')
Now I need some information from the original dataframe, so I only keep one event for each origin_id and drop the columns I'm not gonna use
df1 = df.drop_duplicates(subset='origin_id').drop(['type','event'], axis=1)
Merging the both dataframes, df1 using the values of origin_id and stat using the index.
pd.merge(df1, stat, how='inner', left_on = 'origin_id', right_index = True)
Using the first dataframe with the code above I get this result:
origin_id serialn X1 X2 X3 X4 year month day
1 101 A A B null 2017 6 10
2 151 A null B C 2016 7 15
Is there another way to do this?
Thanks!