Translate list of labels into array of labels per ID in python

Question

I have a data frame with texts and labels. Each text has multiple rows with on label.

dummy_df = pd.DataFrame([['Text1','label1'], ['Text1', 'label2']], columns=["TEXT", "LABELS"])

I would like to have the following to apply MultiLabelBinarizer() function.

TEXT | LABEL
Text1| [[label1,label2]]

Reference 1 Reference 2

score 1 · Accepted Answer · answered Apr 03 '23 at 08:58

1

If need nested lists use lambda function in GroupBy.agg:

df = dummy_df.groupby('TEXT')['LABELS'].agg(lambda x: [x.tolist()]).reset_index()
print (df)
    TEXT              LABELS
0  Text1  [[label1, label2]]

Not nested lists:

df1 = dummy_df.groupby('TEXT')['LABELS'].agg(list).reset_index()
print (df1)
    TEXT            LABELS
0  Text1  [label1, label2]

answered Apr 03 '23 at 08:58

jezrael

822,522
95
1,334
1,252

Thank you! yes, the answers in the other post were confusing. your answer is easy to replicate and did the Job. – sveer Apr 03 '23 at 09:15

Translate list of labels into array of labels per ID in python

1 Answers1