group data in one column based on duplicates in another column in pandas

Question

I have a dataframe with 100196 rows x 8 columns.

I am expecting to get output similar to shown below

The column 2 to column 5 has same data as ID's are same. I just want to have all the Design codes in one cells as new line. I understand that we can use groupby and agg function, but I can only add one column.

Could anyone please suggest, how to solve this issue?

please post a minimum reproducible example – Talha Tayyab Jun 20 '23 at 10:42 — Talha Tayyab, Jun 20 '23 at 10:42

score 2 · Accepted Answer · answered Jun 20 '23 at 11:18

I have created a mock dataset based on your example:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    "ID": ["1", "1", "2", "3", "3", "3"],
    "Data 1": ["data11", "data11", "data12", "data13", "data13", "data13"],
    "Data 2": ["data21", "data21", "data22", "data23", "data23", "data23"],
    "Design Code": ["a", "b", "c", "d", "e", "f"]
})

You can apply different aggregations to different columns in groupby to achieve the expected results, for example:

df \
    .groupby("ID", as_index=False) \
    .agg({
        "Data 1": min,
        "Data 2": min,
        "Design Code": lambda x: "\n".join(np.unique(x)),
    })

Output:

         ID Data 1  Data 2  Design Code
0        1  data11  data21  a\nb
1        2  data12  data22  c
2        3  data13  data23  d\ne\nf

As for the displaying of Design Code with line breaks you can refer to https://stackoverflow.com/a/46326725/22052558 if you are using jupyter notebook.

group data in one column based on duplicates in another column in pandas

1 Answers1