How to change values of a specific pandas dateframe column of rows with duplicate IDs, and put those multiple values into 1 value on 1 row with commas

Question

So my dataframe is pretty complex but i'll keep the important details. This is the dataframe:

id     name     categorycode
0003F8 Car      A
0003F8 Car      A
0003F8 Car      A
0003F8 Car      AVB-A
0003F8 Car      B
0003F8 Car      B-T
0003F8 Car      BTH
0003U7 Bike     A
0003U7 Bike     A
0003U7 Bike     A-G
0003U7 Bike     ATH
0003U7 Bike     ATH
0003U7 Bike     AVB-A
0003U7 Bike     AVB-A
0003U7 Bike     AVB-A
0003U7 Bike     AVB-A

What I want it to look like is this:

id      name     categorycode
0003F8  Car      A, AVB-A, B, B-T, BTH
0003U7  Bike     A, A-G, ATH, AVB-A

So basically what i want is that only the categorycode values will be changed, and that the name values won't be changed. I'm not sure what the best practice is to use to achieve this. All the help would be greatly appreciated.

`df.groupby('id')['categorycode'].agg(set)`? or `df.groupby('id')['categorycode'].agg(lambda x: ', '.join(sorted(set(x))))` — mozway, May 04 '22 at 11:56
@mozway this does work for just those 2 columns, but if i want to include all the other columns, how would i do that? — miitchel, May 04 '22 at 12:57
read the `agg` documentation, you can use aggregation per column, even multiple aggregation per multiple columns — mozway, May 04 '22 at 12:58
@mozway yeah, but if I just want only the categorycode values changed, and then add the rest of the columns to complete the dataframe, how would i do that? — miitchel, May 04 '22 at 13:02
what are the other columns? Just name? Doesn't `df.groupby('id', as_index=False).agg({'name': 'first', 'categorycode': lambda x: ', '.join(sorted(set(x)))})` work for you? — mozway, May 04 '22 at 13:10
@mozway there's around 20 other columns, but i only want to group `categorycode` and keep the rest of the dataframe, but only categorycode being grouped. this is because 1 ID has multiple rows of the same name, but more than 1 categorycodes. — miitchel, May 04 '22 at 13:14
so you don't want aggregation? Are the other values always identical within a group? if not can you provide an example? — mozway, May 04 '22 at 13:16
the given suggestion you gave seems to work, but what does the 'first' mean? i looked in the documentation but i cant find anything — miitchel, May 04 '22 at 13:26
first value in the group, so if all the values are identical this doesn't matter. What if there was a "Plane" among the "Car"s in "0003F8"? What should be done? — mozway, May 04 '22 at 13:28
if you have other columns then you should show some other columns in example data and what result you expect with other columns. If columns may have different values then you should show which value to keep. — furas, May 04 '22 at 13:33
then you can use 'first', 'last', 'min', 'max', whatever ;) Is your question solved? — mozway, May 04 '22 at 13:37

How to change values of a specific pandas dateframe column of rows with duplicate IDs, and put those multiple values into 1 value on 1 row with commas

0 Answers0