2

I am using (version 0.20.3) and I want to apply the diff() method with groupby() but instead of a DataFrame, the result is an "underscore".

Here is the code:

import numpy as np
import pandas as pd

# creating the DataFrame
data = np.random.random(18).reshape(6,3)
indexes = ['B']*3 + ['A']*3
columns = ['x', 'y', 'z']
df = pd.DataFrame(data, index=indexes, columns=columns)
df.index.name = 'chain_id'

# Now I want to apply the diff method in function of the chain_id
df.groupby('chain_id').diff()

And the result is an underscore!

Note that df.loc['A'].diff() and df.loc['B'].diff() do return the expected results so I don't understand why it wouldn't work with groupby().

smci
  • 32,567
  • 20
  • 113
  • 146
guillaume
  • 61
  • 4
  • Because you have a non-unique index! Your index has duplicates `['B']*3 + ['A']*3`. – smci Feb 05 '21 at 05:59

1 Answers1

3

IIUC,Your error :cannot reindex from a duplicate axis

df.reset_index().groupby('chain_id').diff().set_index(df.index)
Out[859]: 
                 x         y         z
chain_id                              
B              NaN       NaN       NaN
B        -0.468771  0.192558 -0.443570
B         0.323697  0.288441  0.441060
A              NaN       NaN       NaN
A        -0.198785  0.056766  0.081513
A         0.138780  0.563841  0.635097
BENY
  • 317,841
  • 20
  • 164
  • 234