1

I try to calculate groupby pct_changeusing df.groupby('type')['value'].apply(lambda x: x.pct_change()) for a dataframe.

But it generates ValueError: cannot reindex from a duplicate axis, any ideas how to deal with this issues? Thanks.

ah bon
  • 9,293
  • 12
  • 65
  • 148
  • You very likely have duplicate values in your index . Check this link https://stackoverflow.com/questions/27236275/what-does-valueerror-cannot-reindex-from-a-duplicate-axis-mean – moys Dec 13 '19 at 05:21

1 Answers1

1

I get this same error if i have duplicates in my index. You'll need to reset_index():

In [726]: df.append(df)                                                                                                                                                                        
Out[726]: 
  customer brand product  quantity  price  new_quantity
0       C1    B1      P1       100      5           500
1       C1    B1      P2        10     20           200
2       C1    B2      P3        50      7           350
3       C2    B1      P1        75      5           375
4       C2    B2      P3         5      7            35
0       C1    B1      P1       100      5           500
1       C1    B1      P2        10     20           200
2       C1    B2      P3        50      7           350
3       C2    B1      P1        75      5           375
4       C2    B2      P3         5      7            35

df.groupby('customer')['quantity'].apply(lambda x: x.pct_change()) 

# ValueError: cannot reindex from a duplicate axis

In [730]: df.append(df).reset_index().groupby('customer')['quantity'].apply(lambda x: x.pct_change())                                                                                          
Out[730]: 
0          NaN
1    -0.900000
2     4.000000
3          NaN
4    -0.933333
5     1.000000
6    -0.900000
7     4.000000
8    14.000000
9    -0.933333
Name: quantity, dtype: float64
oppressionslayer
  • 6,942
  • 2
  • 7
  • 24