Create a df new column which includes a list

Question

I'm working on a multi-label image classifaction task. I have a dataframe with two columns (id and labels). I want to create a new column, which checks the ids for duplicates and if there is a duplicate (which is the case) the additional label should be assigned to the new column. The result should be a new column including all labels. Im struggling to write the labels in a new column as a list. Does anyone can support me here?

My df has the following structures:

| id       | labels         |
| -------- | -------------- |
| x.jpg    | label_1        |
| x.jpg    | label_2        |

New dataframe

| id       | labels         | all_labels       |
| -------- | -------------- |-------------------
| x.jpg    | label_1        | [label_1, label_2, and other if existent]
| x.jpg    | label_2        |

Does this answer your question? [How to group dataframe rows into list in pandas groupby](https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby) — PeCaDe, Oct 17 '22 at 10:46

score 0 · Accepted Answer · answered Oct 17 '22 at 10:42

0

I think this does what you want although the format is a bit different:

newdf = df.groupby('id')['labels'].agg(list).reset_index(name='labels')

produces

      id              labels
0  x.jpg  [label_1, label_2]
1  y.jpg           [label_3]

answered Oct 17 '22 at 10:42

user19077881

3,643
2
3
14

Create a df new column which includes a list

1 Answers1