2

The data looks like this:

origin_id   type   serialn     event    year    month    day
1              A       101        X1    2017        6     10
1              A       101        X2    2017        6     10
1              B       101        X3    2017        6     10
2              A       151        X1    2016        7     15
2              B       151        X3    2016        7     15
2              C       151        X4    2016        7     15

And I need it to be this way:

origin_id    serialn   X1    X2    X3   X4    year    month    day
        1        101    A     A     B null    2017        6     10
        2        151    A  null     B    C    2016        7     15

So basically what I need is to use the values of the column event as headers and put the value of the column type for each event, when there is no event for a certain origin_id put a null. Some other columns from the dataframe, like serialn and origin_id should be in the resulting one. Also there should be only one row for each origin_id

This question: How to pivot a dataframe touches on some points, altough it's geared towards performing an aggregation at some point.

This is a possible solution.

I get a df with the origin_id as the index, the events as columns and the types as their value.

stat = df.pivot(values='type', index='origin_id', columns='event')

Now I need some information from the original dataframe, so I only keep one event for each origin_id and drop the columns I'm not gonna use

df1 = df.drop_duplicates(subset='origin_id').drop(['type','event'], axis=1)

Merging the both dataframes, df1 using the values of origin_id and stat using the index.

pd.merge(df1, stat, how='inner', left_on = 'origin_id', right_index = True)

Using the first dataframe with the code above I get this result:

origin_id    serialn   X1    X2    X3   X4    year    month    day
        1        101    A     A     B null    2017        6     10
        2        151    A  null     B    C    2016        7     15

Is there another way to do this?

Thanks!

Bluuu
  • 75
  • 1
  • 6

1 Answers1

2

You can do it this way:

In [85]: df.pivot_table(index=df.columns.drop(['event','type']).tolist(),
                        columns='event',
                        values='type',
                        aggfunc='first') \
            .reset_index()
            .rename_axis(None,1)
Out[85]:
   origin_id  serialn  year  month  day X1    X2 X3    X4
0          1      101  2017      6   10  A     A  B  None
1          2      151  2016      7   15  A  None  B     C
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419