Convert specific list of strings cells into multiple rows and keep the other columns

Question

I have pandas dataframe that looks like this:

label	pred	gt
label1	val1	val11
label2	['str1', str2']	['str1', 'str3', 'str4']
label3	foo	box

And I want to convert label2 row where I have lists of strings or None value to multiple rows (in case it is a list of strings):

label	pred	gt
label1	val1	val11
label2	'str1'	'str1'
label2	str2'	'str3'
label2	None	'str4'
label3	foo	box

I have used explode() for this purpose but I get new dataframe with all nan values and the 'exploded' rows are not matched to the right label. Here is my code:

df_filtered = output_df[output_df['label'] == 'label2']

# explode the list column into multiple rows while keeping other columns
df_exploded = pd.concat([
    df_filtered.drop(['pred', 'gt'], axis=1),
    df_filtered['pred'].explode().reset_index(drop=True),
    df_filtered['gt'].explode().reset_index(drop=True)
], axis=1)

# add prefix to the existing column name (label) to differentiate each new row
df_exploded = df_exploded.add_prefix('new_')

# rename the columns to remove the prefix from the original column
df_exploded = df_exploded.rename(columns={'new_pred': 'pred', 'new_gt': 'gt'})

# combine the exploded dataframe with the original dataframe, dropping the original list column
df_combined = pd.concat([output_df.drop(['pred', 'gt'], axis=1), df_exploded], axis=1)

Any help would be appreciated.

score 1 · Answer 1 · answered Mar 29 '23 at 07:07

1

You can try to explode independently and concat the de-duplicated rows:

cols = ['pred', 'gt']

others = df.columns.difference(cols)
out = pd.concat([df.explode(c)[others.union([c])]
                   .assign(n=lambda d: d.groupby(level=0).cumcount())
                   .set_index(['n']+list(others), append=True)
                 for c in cols], axis=1
               ).sort_index(level=[0, 1]).droplevel(1).reset_index(others)

print(out)

Output:

    label  pred     gt
0  label1  val1  val11
1  label2  str1   str1
1  label2  str2   str3
1  label2   NaN   str4
2  label3   foo    box

answered Mar 29 '23 at 07:07

mozway

194,879
13
39
75

I get this error: InvalidIndexError: Reindexing only valid with uniquely valued Index objects – Yana Mar 29 '23 at 07:14
1

Your original index must be non duplicated (`df = df.reset_index(drop=True)`). If this is not the case prepend a unique index to it and discard it in the end ;) – mozway Mar 29 '23 at 07:28

Convert specific list of strings cells into multiple rows and keep the other columns

1 Answers1