'ValueError: cannot reindex from a duplicate axis', trying to explode column

Question

I am trying to handle the following error: ValueError: cannot reindex from a duplicate axis. My dataset looks like

first_column   second_column   ... fifth_column                ... eleventh_column ...    Age
example 1       first ex           ['avers','aaa','hello']              41241              12
example 2       second ex           []                                  431                32
another         third ex           ['AA','B','C','aaa','hello']         21                 32
example 1       example           ['avers','aaa','hello']              41241              12

I would like to have something like this:

first_column   second_column   ... fifth_column  ... eleventh_column ...    Age
    example 1       first ex           avers              41241              12
    example 1       first ex           aaa                41241              12
    example 1       first ex           hello              41241              12
    example 2       second ex            nan                431              32
    another         third ex           AA                    21              32
    another         third ex           B                     21              32
    another         third ex           C                     21              32

    another         third ex           aaa                   21              32
    another         third ex           hello                 21              32

    example 1       example           avers               41241              12
    example 1       example           aaa                 41241              12
    example 1       example           hello               41241              12

For my understanding, I should apply explode:

df = df.loc[:,~df.columns.duplicated()]
df1=df.set_index('first_column').apply(pd.Series.explode).reset_index()
df1.fifth_column

However, I am getting the error:

ValueError: cannot reindex from a duplicate axis.

What am I doing wrong?

Does https://stackoverflow.com/questions/53218931/how-to-unnest-explode-a-column-in-a-pandas-dataframe help you? — Ynjxsjmh, Apr 03 '21 at 00:58
Please provide your data in pandas dataframe format, as a dictionary, or at least remove the elipses so that we can read it with `pd.read_clipboard()` — semblable, Apr 03 '21 at 00:59

score 1 · Accepted Answer · answered Apr 03 '21 at 00:52

Unless you are absolutely sure, it's usually not advisable to use an index that has duplicate values (as in your first_column) because it can be unsupported for certain operations.

Since it looks like you are trying to reset the index to default integer values anyway and just using set_index() to remove the 'leftover' index values of df, I would suggest the following:

df1 = df.apply(pd.Series.explode).set_index('first_column').reset_index()

score 1 · Answer 2 · answered Apr 03 '21 at 00:53

It's hard to check without data in pandas dataframe format, but it looks like you're applying explode to the whole dataframe, where you should be applying it just to fifth_column. You need to pass a column name to explode as an argument. df.explode('fifth_column') should probably appear somewhere in your code.

'ValueError: cannot reindex from a duplicate axis', trying to explode column

2 Answers2