1

I am trying to handle the following error: ValueError: cannot reindex from a duplicate axis. My dataset looks like

first_column   second_column   ... fifth_column                ... eleventh_column ...    Age
example 1       first ex           ['avers','aaa','hello']              41241              12
example 2       second ex           []                                  431                32
another         third ex           ['AA','B','C','aaa','hello']         21                 32
example 1       example           ['avers','aaa','hello']              41241              12

I would like to have something like this:

first_column   second_column   ... fifth_column  ... eleventh_column ...    Age
    example 1       first ex           avers              41241              12
    example 1       first ex           aaa                41241              12
    example 1       first ex           hello              41241              12
    example 2       second ex            nan                431              32
    another         third ex           AA                    21              32
    another         third ex           B                     21              32
    another         third ex           C                     21              32

    another         third ex           aaa                   21              32
    another         third ex           hello                 21              32

    example 1       example           avers               41241              12
    example 1       example           aaa                 41241              12
    example 1       example           hello               41241              12

For my understanding, I should apply explode:

df = df.loc[:,~df.columns.duplicated()]
df1=df.set_index('first_column').apply(pd.Series.explode).reset_index()
df1.fifth_column 

However, I am getting the error:

ValueError: cannot reindex from a duplicate axis.

What am I doing wrong?

V_sqrt
  • 537
  • 8
  • 28
  • Does https://stackoverflow.com/questions/53218931/how-to-unnest-explode-a-column-in-a-pandas-dataframe help you? – Ynjxsjmh Apr 03 '21 at 00:58
  • Please provide your data in pandas dataframe format, as a dictionary, or at least remove the elipses so that we can read it with `pd.read_clipboard()` – semblable Apr 03 '21 at 00:59

2 Answers2

1

Unless you are absolutely sure, it's usually not advisable to use an index that has duplicate values (as in your first_column) because it can be unsupported for certain operations.

Since it looks like you are trying to reset the index to default integer values anyway and just using set_index() to remove the 'leftover' index values of df, I would suggest the following:

df1 = df.apply(pd.Series.explode).set_index('first_column').reset_index()

Frodnar
  • 2,129
  • 2
  • 6
  • 20
1

It's hard to check without data in pandas dataframe format, but it looks like you're applying explode to the whole dataframe, where you should be applying it just to fifth_column. You need to pass a column name to explode as an argument. df.explode('fifth_column') should probably appear somewhere in your code.

semblable
  • 773
  • 1
  • 8
  • 26