0

So my dataframe is pretty complex but i'll keep the important details. This is the dataframe:

id     name     categorycode
0003F8 Car      A
0003F8 Car      A
0003F8 Car      A
0003F8 Car      AVB-A
0003F8 Car      B
0003F8 Car      B-T
0003F8 Car      BTH
0003U7 Bike     A
0003U7 Bike     A
0003U7 Bike     A-G
0003U7 Bike     ATH
0003U7 Bike     ATH
0003U7 Bike     AVB-A
0003U7 Bike     AVB-A
0003U7 Bike     AVB-A
0003U7 Bike     AVB-A

What I want it to look like is this:

id      name     categorycode
0003F8  Car      A, AVB-A, B, B-T, BTH
0003U7  Bike     A, A-G, ATH, AVB-A

So basically what i want is that only the categorycode values will be changed, and that the name values won't be changed. I'm not sure what the best practice is to use to achieve this. All the help would be greatly appreciated.

miitchel
  • 255
  • 1
  • 2
  • 10
  • `df.groupby('id')['categorycode'].agg(set)`? or `df.groupby('id')['categorycode'].agg(lambda x: ', '.join(sorted(set(x))))` – mozway May 04 '22 at 11:56
  • @mozway this does work for just those 2 columns, but if i want to include all the other columns, how would i do that? – miitchel May 04 '22 at 12:57
  • read the `agg` documentation, you can use aggregation per column, even multiple aggregation per multiple columns – mozway May 04 '22 at 12:58
  • i also edited my post for a little bit more clear – miitchel May 04 '22 at 12:58
  • @mozway yeah, but if I just want only the categorycode values changed, and then add the rest of the columns to complete the dataframe, how would i do that? – miitchel May 04 '22 at 13:02
  • what are the other columns? Just name? Doesn't `df.groupby('id', as_index=False).agg({'name': 'first', 'categorycode': lambda x: ', '.join(sorted(set(x)))})` work for you? – mozway May 04 '22 at 13:10
  • @mozway there's around 20 other columns, but i only want to group `categorycode` and keep the rest of the dataframe, but only categorycode being grouped. this is because 1 ID has multiple rows of the same name, but more than 1 categorycodes. – miitchel May 04 '22 at 13:14
  • so you don't want aggregation? Are the other values always identical within a group? if not can you provide an example? – mozway May 04 '22 at 13:16
  • the given suggestion you gave seems to work, but what does the 'first' mean? i looked in the documentation but i cant find anything – miitchel May 04 '22 at 13:26
  • first value in the group, so if all the values are identical this doesn't matter. What if there was a "Plane" among the "Car"s in "0003F8"? What should be done? – mozway May 04 '22 at 13:28
  • if you have other columns then you should show some other columns in example data and what result you expect with other columns. If columns may have different values then you should show which value to keep. – furas May 04 '22 at 13:33
  • @mozway this doesn't happen. – miitchel May 04 '22 at 13:36
  • then you can use 'first', 'last', 'min', 'max', whatever ;) Is your question solved? – mozway May 04 '22 at 13:37

0 Answers0