I am trying to handle the following error: ValueError: cannot reindex from a duplicate axis
.
My dataset looks like
first_column second_column ... fifth_column ... eleventh_column ... Age
example 1 first ex ['avers','aaa','hello'] 41241 12
example 2 second ex [] 431 32
another third ex ['AA','B','C','aaa','hello'] 21 32
example 1 example ['avers','aaa','hello'] 41241 12
I would like to have something like this:
first_column second_column ... fifth_column ... eleventh_column ... Age
example 1 first ex avers 41241 12
example 1 first ex aaa 41241 12
example 1 first ex hello 41241 12
example 2 second ex nan 431 32
another third ex AA 21 32
another third ex B 21 32
another third ex C 21 32
another third ex aaa 21 32
another third ex hello 21 32
example 1 example avers 41241 12
example 1 example aaa 41241 12
example 1 example hello 41241 12
For my understanding, I should apply explode
:
df = df.loc[:,~df.columns.duplicated()]
df1=df.set_index('first_column').apply(pd.Series.explode).reset_index()
df1.fifth_column
However, I am getting the error:
ValueError: cannot reindex from a duplicate axis.
What am I doing wrong?