I have a dataframe like this -
df = pd.DataFrame(
{'key': [1, 2, 3, 4],
'col1': [['apple','orange'], ['pineapple'], ['','','guava','',''], ['','','orange','apple','']],
'col2': [['087','799'], ['681'], ['078'], ['816','018']]
}
)
# key col1 col2
#0 1 [apple, orange] [087, 799]
#1 2 [pineapple] [681]
#2 3 [, , guava, , ] [078]
#3 4 [, , orange, apple, ] [816, 018]
I need to split the columns 'col1' and 'col2' and create separate rows, but map the list elements according to their indices. The desired output is this -
desired_df = pd.DataFrame(
{'key': [1, 1, 2, 3, 4, 4],
'col1': [['apple'],['orange'],['pineapple'], ['guava'], ['orange'],['apple']],
'col2': [['087'],['799'], ['681'], ['078'], ['816'],['018']]
}
)
In col1, there might be elements that are blanks, but the overall length of the non-empty col1 element will match with the length of the corresponding elements of col2. Examples: rows 2 and 3 of df.
I tried the following, but it did not work -
df.set_index(['key'])[['col1','col2']].apply(pd.Series).stack().reset_index(level=1, drop=True)