0

I'm working on a multi-label image classifaction task. I have a dataframe with two columns (id and labels). I want to create a new column, which checks the ids for duplicates and if there is a duplicate (which is the case) the additional label should be assigned to the new column. The result should be a new column including all labels. Im struggling to write the labels in a new column as a list. Does anyone can support me here?

My df has the following structures:

| id       | labels         |
| -------- | -------------- |
| x.jpg    | label_1        |
| x.jpg    | label_2        |

New dataframe

| id       | labels         | all_labels       |
| -------- | -------------- |-------------------
| x.jpg    | label_1        | [label_1, label_2, and other if existent]
| x.jpg    | label_2        |
denoo
  • 5
  • 1
  • Does this answer your question? [How to group dataframe rows into list in pandas groupby](https://stackoverflow.com/questions/22219004/how-to-group-dataframe-rows-into-list-in-pandas-groupby) – PeCaDe Oct 17 '22 at 10:46

1 Answers1

0

I think this does what you want although the format is a bit different:

newdf = df.groupby('id')['labels'].agg(list).reset_index(name='labels')

produces

      id              labels
0  x.jpg  [label_1, label_2]
1  y.jpg           [label_3]
user19077881
  • 3,643
  • 2
  • 3
  • 14