21

I have a big table of total 969 rows and I need to find the difference between every two rows, e.g. row1 and row2, row2 and row3, row3 and row4 etc. How can I do that? I was told to do it by the command diff() but I have no idea where to start.

Michael Hoffman
  • 32,526
  • 7
  • 64
  • 86
user1084079
  • 211
  • 1
  • 2
  • 3

4 Answers4

37

Here's an example of how to use diff() on the built-in mtcars data.frame. You have to select a column to perform the diff over:

mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
[..snip..]
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Calculate the successive differences of e.g. the column "qsec":

diff(mtcars$qsec)
 [1]  0.56  1.59  0.83 -2.42  3.20 -4.38  4.16  2.90 -4.60  0.60 -1.50  0.20
[13]  0.40 -0.02 -0.16 -0.40  2.05 -0.95  1.38  0.11 -3.14  0.43 -1.89  1.64
[25]  1.85 -2.20  0.20 -2.40  1.00 -0.90  4.00
Henrik
  • 65,555
  • 14
  • 143
  • 159
Michael Hoffman
  • 32,526
  • 7
  • 64
  • 86
  • 2
    This is so great! So simple, but so great. I sometimes feel that R is some vast jungle forest with complex routes thru it. Simple routes are there but can be hard to find. This is one of them. Many thanks!! (It runs lightning fast and avoids having to use R to do iteration!! life doesn't get better than this! – WGray Mar 02 '13 at 04:15
  • 1
    Can also do this by group: http://stackoverflow.com/questions/14846547/calculating-difference-row-values-by-group-in-r And this works on Date() formatted values as well: http://stackoverflow.com/questions/30378946/calculating-time-difference-by-id – Brian D Nov 10 '16 at 20:35
15

You could simply subtract a data.frame consisting of rows 1:(n-1) of the original data.frame from a second one consisting of rows 2:n. (Here n is the number of rows in the original data.frame):

# Example data
df <- data.frame(a=1:4, b=4:1, c=11:14, d=c(2,4,10,0))
#   a b  c  d
# 1 1 4 11  2
# 2 2 3 12  4
# 3 3 2 13 10
# 4 4 1 14  0

# Calculate the differences
diff_df <- df[-1,] - df[-nrow(df),]
diff_df
#   a  b c   d
# 2 1 -1 1   2
# 3 1 -1 1   6
# 4 1 -1 1 -10

You can then rename the rows as you see fit using something like:

row.names(diff_df) <- paste("d", seq_len(nrow(diff_df)), sep="")
diff_df
#    a  b c   d
# d1 1 -1 1   2
# d2 1 -1 1   6
# d3 1 -1 1 -10
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • Did you mean to do the same thing, but with rows? I mean, it's the same idea, but the OP was talking about rows. – joran Dec 06 '11 at 18:26
  • Could also use the `head` and `tail` functions like `tail(df, -1) - head(df, -1)`. – lmo Mar 27 '17 at 13:28
5

We can also use dplyr::lag and data.table::shift

library(dplyr)
df <- mtcars

df %>%  mutate(diff_row = mpg - lag(mpg))

#    mpg cyl  disp  hp drat    wt  qsec vs am gear carb diff_row
#1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4       NA
#2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4      0.0
#3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1      1.8
#4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1     -1.4
#5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2     -2.7
#6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1     -0.6
#7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4     -3.8
#....

As there is no lag for first row, the default value is NA. If we want the first value returned to be 0 instead, we can set the default value to first value of the variable so the subtracting yields to 0.

df %>%  mutate(diff_row = mpg - lag(mpg, default = first(mpg)))

#    mpg cyl  disp  hp drat    wt  qsec vs am gear carb diff_row
#1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4      0.0
#2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4      0.0
#3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1      1.8
#4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1     -1.4
#5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2     -2.7
#6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1     -0.6
#7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4     -3.8
#....

The same process can also be done using shift from data.table whose default type is lag

library(data.table)
setDT(df)[, diff_row := mpg - shift(mpg)]

Or to get 0 value in the first row, here we use fill argument.

setDT(df)[, diff_row := mpg - shift(mpg, fill = first(mpg))]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
3

This would have be clearer with sample data. Let's assume you mean "numerical difference", and your data can be represented as a matrix, this would do.

 set.seed(4871)
 m = matrix(sample(1:5,50,TRUE),nrow=10,ncol=5)
 m
 t(apply(m,1,diff))
Dieter Menne
  • 10,076
  • 44
  • 67