7

How do I subtract one row from another in the following dataframe (df):

RECL_LCC          1          2          3
RECL_LCC  35.107655  36.015210  28.877135
RECL_PI   36.961519  43.499506  19.538975

I want to do something like:

df['Difference'] = df['RECL_LCC']-df['RECL_PI']

but that gives:

*** KeyError: 'RECL_LCC'
user308827
  • 21,227
  • 87
  • 254
  • 417

2 Answers2

9

You can select rows by index value using df.loc:

In [98]: df.loc['Diff'] = df.loc['RECL_LCC'] - df.loc['RECL_PI']

In [99]: df
Out[99]: 
RECL_LCC          1          2          3
RECL_LCC  35.107655  36.015210  28.877135
RECL_PI   36.961519  43.499506  19.538975
Diff      -1.853864  -7.484296   9.338160
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • What if you want to compute this for a couple of hundred rows? – Kartik Oct 26 '15 at 19:26
  • @Kartik: There is no qualitative difference. Note however that DataFrames store data in a [column-based format](http://stackoverflow.com/q/25918149/190597). So if the columns have different `dtypes`, then you will be better off building your DataFrame in a format which allows you to take a difference of columns instead of a difference of rows. – unutbu Oct 26 '15 at 19:38
  • @Kartik: Also note that Pandas aligns values based on the index. Consider, for example, `pd.Series([1,2,3], index=['A','B','C']) - pd.Series([2,1,3], index=['B','A','C'])` which returns `pd.Series([0,0,0], index=['A','B','C'])`. That is very useful sometimes, but if your data is already aligned (i.e. the two Series have identical index) then you may get better performance treating your data as one big NumPy array and performing your calculation using NumPy instead of Pandas. This is especially true if you wish to compute the difference of every pair of rows and not just two. – unutbu Oct 26 '15 at 19:50
  • @unutbu. Thanks! But, this was the answer I was looking for: http://stackoverflow.com/questions/23142967/adding-a-column-thats-result-of-difference-in-consecutive-rows-in-pandas. Using `shift`, and storing the shifted values in another column. The function I was looking for was `shift`. I remembered reading about it at some point, but forgot the details. – Kartik Oct 26 '15 at 20:02
3

you can use diff() function:

df.set_index('RECT_LCC', inplace=True)
df.diff(-1)


                    1           2         3
RECT_LCC            
RECT_LCC    -1.853864   -7.484296   9.33816
RECL_PI           NaN         NaN       NaN

by default, it shifts by 1 row. In your case, since you subtracting next row instead of previous you need to set diff(-1)

Vlad Bezden
  • 83,883
  • 25
  • 248
  • 179