89

Lets say I have a dataframe like this

    A   B
0   a   b
1   c   d
2   e   f 
3   g   h

0,1,2,3 are times, a, c, e, g is one time series and b, d, f, h is another time series. I need to be able to add two columns to the orignal dataframe which is got by computing the differences of consecutive rows for certain columns.

So i need something like this

    A   B   dA
0   a   b  (a-c)
1   c   d  (c-e)
2   e   f  (e-g)
3   g   h   Nan

I saw something called diff on the dataframe/series but that does it slightly differently as in first element will become Nan.

AMM
  • 17,130
  • 24
  • 65
  • 77

4 Answers4

155

Use shift.

df['dA'] = df['A'] - df['A'].shift(-1)
Jan D.M.
  • 2,284
  • 2
  • 20
  • 32
exp1orer
  • 11,481
  • 7
  • 38
  • 51
  • what if 'A' contains list elements? – LNRD.CLL Dec 03 '16 at 06:36
  • It works as long as that operation makes sense for whatever values are in your dataframe! For example, if each "cell" in your DataFrame is a list, you can do `df['A'] + df['A'].shift()`, but you can't subtract since subtraction isn't defined for lists. – exp1orer Dec 06 '16 at 14:17
  • 1
    What can you do about the 'NaN' in the first/last row? – André Fernandes Jan 19 '17 at 20:40
  • 1
    @AndréFernandes what would you like to be there? You can always do a `fillna` afterwards, but if you're shifting down, what would be a sensible value for the first row? – exp1orer Jan 20 '17 at 20:18
  • For example, hourly differences of an cumulative counter. The first row has to be filled with something, either 0 or the value from another of the row's fields. – André Fernandes Jan 20 '17 at 23:07
  • Okay just use `fillna` on the result then. Eg `df['dA'].fillna(0)` – exp1orer Jan 21 '17 at 18:49
  • i am getting SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead – Parthiban Rajendran Nov 25 '18 at 14:12
  • 1
    @PaariVendhan That's a very common issue with pandas, and has to do with how you've defined your dataframe elsewhere. It's not related to the `shift` method. Here's a very long article https://www.dataquest.io/blog/settingwithcopywarning/ and a shorter stackoverflow answer that may be helpful https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas – exp1orer Nov 26 '18 at 21:15
  • Why not just `df['dA'] = df['A'] - df['B']` ? Any benefit of using this expression? – outsider Jun 15 '20 at 05:48
  • @outsider you may want to re-read the question. Column `dA` does not depend on column `B`. – exp1orer Jun 16 '20 at 16:15
51

You could use diff and pass -1 as the periods argument:

>>> df = pd.DataFrame({"A": [9, 4, 2, 1], "B": [12, 7, 5, 4]})
>>> df["dA"] = df["A"].diff(-1)
>>> df
   A   B  dA
0  9  12   5
1  4   7   2
2  2   5   1
3  1   4 NaN

[4 rows x 3 columns]
DSM
  • 342,061
  • 65
  • 592
  • 494
3

When using data in CSV, this would work perfectly:

my_data = pd.read_csv('sale_data.csv')
df = pd.DataFrame(my_data)
df['New_column'] = df['target_column'].diff(1)
print(df) #for the console but not necessary 
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Seth Okeyo
  • 59
  • 2
1

Rolling differences can also be calculated this way:

df=pd.DataFrame(my_data)
my_data = pd.read_csv('sales_data.csv')
i=0
j=1
while j < len(df['Target_column']):
    j=df['Target_column'][i+1] - df['Target_column'][i] #the difference btwn two values in a column.
    i+=1 #move to the next value in the column.
    j+=1 #next value in the new column.
    print(j)
Seth Okeyo
  • 59
  • 2
  • this is when you want to calculate the rolling differences in a column in CSV, for example, you want to get the difference between two consecutive values in a column (Target_column) and store the value in a different column(New_column). Rolling differences for example, you have a column called ['Profit'], to get the differences to use in calculating marginal revenue, you will do value2 - value1, value3- Value2, Value 4 - Value 3.... etc – Seth Okeyo Feb 15 '19 at 12:13