1

I have a data frame with texts and labels. Each text has multiple rows with on label.

dummy_df = pd.DataFrame([['Text1','label1'], ['Text1', 'label2']], columns=["TEXT", "LABELS"])

I would like to have the following to apply MultiLabelBinarizer() function.

TEXT | LABEL
Text1| [[label1,label2]]

Reference 1 Reference 2

sveer
  • 427
  • 3
  • 16

1 Answers1

1

If need nested lists use lambda function in GroupBy.agg:

df = dummy_df.groupby('TEXT')['LABELS'].agg(lambda x: [x.tolist()]).reset_index()
print (df)
    TEXT              LABELS
0  Text1  [[label1, label2]]

Not nested lists:

df1 = dummy_df.groupby('TEXT')['LABELS'].agg(list).reset_index()
print (df1)
    TEXT            LABELS
0  Text1  [label1, label2]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you! yes, the answers in the other post were confusing. your answer is easy to replicate and did the Job. – sveer Apr 03 '23 at 09:15