Create pandas dataframe columns from specific values in other column

Asked Feb 09 '21 at 17:23

Active Feb 09 '21 at 17:28

Viewed 28 times

Not sure I worded the summary very well. But basically, I have a dataframe that could have multiple values in some column, per ID. I want to take those potential values (say there are 3 I care about) and create a column for each containing and X if that ID had that corresponding value. So basically taking this:

some_id	foo
1	10
1	20
2	10
2	20
2	30
3	30

and creating this:

some_id	10	20	30
1	X	X
2	X	X	X
3			X

I've been able to accomplish that, but I wonder if there is a better way to do it than I have. I created a new column for each, gave it an X or blank string, then used the pandas group by function aggregating the new columns using .max().

This way just seems more hacky to me than an elegant solution. Also, is there a term for what I'm trying to accomplish here? I don't think denormalize applies.

import pandas as pd

df = pd.DataFrame({'some_id': [1,1,2,2,2,3], 'foo': [10,20,10,20,30,30]})
print(df)

for col in [10,20,30]:
    df[col] = df.apply(lambda row : "X" if row['foo'] == col else "", axis = 1)

output_df = df.groupby('some_id')[[10,20,30]].max().reset_index()
print(output_df)

asked Feb 09 '21 at 17:23

obi_wan_jabroni

2

`pd.crosstab(df['some_id'], df['foo'])`. – Quang Hoang Feb 09 '21 at 17:24
@Manakin yes, that is super helpful – obi_wan_jabroni Feb 09 '21 at 19:12
@QuangHoang thanks that def pointed me in the right direction – obi_wan_jabroni Feb 09 '21 at 19:12

Create pandas dataframe columns from specific values in other column

0 Answers0