Converting pandas df from long to wide with duplicated rows and categorical variables as values

Question

I have a simple dataframe

{'ID': {0: 101, 1: 101, 2: 101, 3: 102, 4: 102, 5: 102, 6: 102, 7: 102, 8: 103, 9: 103}, 'Category': {0: 'A', 1: 'B', 2: 'C', 3: 'A', 4: 'A', 5: 'A', 6: 'B', 7: 'B', 8: 'A', 9: 'B'}}

You can see that ID has duplicates and gives me a reshaping error.

I want to convert if from wide to long and keep values in the Category column as a new column (I will have max 5 categories per ID, but some will have less than 5 and should have NaN. Plus want to rename columns to Role1, Role2 etc.

Expected output

ID  Role1   Role2   Role3   Role4   Role5
101 A       B       C       NaN     NaN
102 A       A       A       B       B
103 A       B       NaN     NaN     NaN

"*You can see that ID has duplicates and gives me a reshaping error.*" <- where did the error come from without code? — mozway, Aug 25 '22 at 14:51
Ah, okay, sorry, the error was - valueerror: index contains duplicate entries, cannot reshape — Anakin Skywalker, Aug 25 '22 at 14:59
So your had some code, you should have included it in the question ;) — mozway, Aug 25 '22 at 15:01

Quang Hoang · Accepted Answer · 2022-08-25T14:43:08.830

You can assign an enumeration within each ID and query that before pivoting:

N = 5 
(df.assign(role_enum=df.groupby('ID').cumcount()+1)
   .query('role_enum<=@N')
   .pivot(index='ID', columns='role_enum', values='Category')
   .add_prefix('Role').reset_index()   # book-keeping
)

Output:

role_enum   ID Role1 Role2 Role3 Role4 Role5
0          101     A     B     C   NaN   NaN
1          102     A     A     A     B     B
2          103     A     B   NaN   NaN   NaN

Converting pandas df from long to wide with duplicated rows and categorical variables as values

1 Answers1