I have pandas dataframe that looks like this:
label | pred | gt |
---|---|---|
label1 | val1 | val11 |
label2 | ['str1', str2'] | ['str1', 'str3', 'str4'] |
label3 | foo | box |
And I want to convert label2 row where I have lists of strings or None value to multiple rows (in case it is a list of strings):
label | pred | gt |
---|---|---|
label1 | val1 | val11 |
label2 | 'str1' | 'str1' |
label2 | str2' | 'str3' |
label2 | None | 'str4' |
label3 | foo | box |
I have used explode()
for this purpose but I get new dataframe with all nan values and the 'exploded' rows are not matched to the right label. Here is my code:
df_filtered = output_df[output_df['label'] == 'label2']
# explode the list column into multiple rows while keeping other columns
df_exploded = pd.concat([
df_filtered.drop(['pred', 'gt'], axis=1),
df_filtered['pred'].explode().reset_index(drop=True),
df_filtered['gt'].explode().reset_index(drop=True)
], axis=1)
# add prefix to the existing column name (label) to differentiate each new row
df_exploded = df_exploded.add_prefix('new_')
# rename the columns to remove the prefix from the original column
df_exploded = df_exploded.rename(columns={'new_pred': 'pred', 'new_gt': 'gt'})
# combine the exploded dataframe with the original dataframe, dropping the original list column
df_combined = pd.concat([output_df.drop(['pred', 'gt'], axis=1), df_exploded], axis=1)
Any help would be appreciated.