Pandas :: Values of one column as columns

Question

The data looks like this:

origin_id   type   serialn     event    year    month    day
1              A       101        X1    2017        6     10
1              A       101        X2    2017        6     10
1              B       101        X3    2017        6     10
2              A       151        X1    2016        7     15
2              B       151        X3    2016        7     15
2              C       151        X4    2016        7     15

And I need it to be this way:

origin_id    serialn   X1    X2    X3   X4    year    month    day
        1        101    A     A     B null    2017        6     10
        2        151    A  null     B    C    2016        7     15

So basically what I need is to use the values of the column event as headers and put the value of the column type for each event, when there is no event for a certain origin_id put a null. Some other columns from the dataframe, like serialn and origin_id should be in the resulting one. Also there should be only one row for each origin_id

This question: How to pivot a dataframe touches on some points, altough it's geared towards performing an aggregation at some point.

This is a possible solution.

I get a df with the origin_id as the index, the events as columns and the types as their value.

stat = df.pivot(values='type', index='origin_id', columns='event')

Now I need some information from the original dataframe, so I only keep one event for each origin_id and drop the columns I'm not gonna use

df1 = df.drop_duplicates(subset='origin_id').drop(['type','event'], axis=1)

Merging the both dataframes, df1 using the values of origin_id and stat using the index.

pd.merge(df1, stat, how='inner', left_on = 'origin_id', right_index = True)

Using the first dataframe with the code above I get this result:

origin_id    serialn   X1    X2    X3   X4    year    month    day
        1        101    A     A     B null    2017        6     10
        2        151    A  null     B    C    2016        7     15

Is there another way to do this?

Thanks!

Thanks for the link @MaxU, I used some bits of it to get to an answer, do you think I can make it in a more direct way? — Bluuu, Dec 29 '17 at 20:05
I don't know which approach did you chose... I can imagine using `pivot`, `pivot_table` or `set_index` + `unstack`... — MaxU - stand with Ukraine, Dec 29 '17 at 20:08
Can you post a sample data for `df` (`status` column is missing...)? — MaxU - stand with Ukraine, Dec 29 '17 at 20:25
Woops, my bad, forgot to change that one, it should be `event`. — Bluuu, Dec 29 '17 at 20:33
yeah, just realized I got that one wrong again, `event` is the correct one! — Bluuu, Dec 29 '17 at 20:35
Can you rephrase your question to reflect those new details and post your new desired DF (result of merging)? I’ll reopen it then and try to find a solution — MaxU - stand with Ukraine, Dec 29 '17 at 20:49
I don't understand why do you want to keep all those duplicates... What about: `df.pivot_table(index=df.columns.drop(['event','type']).tolist(), columns='event', values='type', aggfunc='first').reset_index()`? — MaxU - stand with Ukraine, Dec 29 '17 at 21:05
Works like a charm, shorter and more readable. The only thing is that the status becomes the index and I don't really know why. If you want post it as an answer so I can pick it :) — Bluuu, Dec 29 '17 at 21:26

score 2 · Accepted Answer · answered Dec 29 '17 at 21:51

You can do it this way:

In [85]: df.pivot_table(index=df.columns.drop(['event','type']).tolist(),
                        columns='event',
                        values='type',
                        aggfunc='first') \
            .reset_index()
            .rename_axis(None,1)
Out[85]:
   origin_id  serialn  year  month  day X1    X2 X3    X4
0          1      101  2017      6   10  A     A  B  None
1          2      151  2016      7   15  A  None  B     C

Pandas :: Values of one column as columns

1 Answers1