I have a big table of total 969 rows and I need to find the difference between every two rows, e.g. row1 and row2, row2 and row3, row3 and row4 etc. How can I do that? I was told to do it by the command diff()
but I have no idea where to start.

- 32,526
- 7
- 64
- 86

- 211
- 1
- 2
- 3
4 Answers
Here's an example of how to use diff()
on the built-in mtcars
data.frame
. You have to select a column to perform the diff over:
mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
[..snip..]
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
Calculate the successive differences of e.g. the column "qsec":
diff(mtcars$qsec)
[1] 0.56 1.59 0.83 -2.42 3.20 -4.38 4.16 2.90 -4.60 0.60 -1.50 0.20
[13] 0.40 -0.02 -0.16 -0.40 2.05 -0.95 1.38 0.11 -3.14 0.43 -1.89 1.64
[25] 1.85 -2.20 0.20 -2.40 1.00 -0.90 4.00

- 65,555
- 14
- 143
- 159

- 32,526
- 7
- 64
- 86
-
2This is so great! So simple, but so great. I sometimes feel that R is some vast jungle forest with complex routes thru it. Simple routes are there but can be hard to find. This is one of them. Many thanks!! (It runs lightning fast and avoids having to use R to do iteration!! life doesn't get better than this! – WGray Mar 02 '13 at 04:15
-
1Can also do this by group: http://stackoverflow.com/questions/14846547/calculating-difference-row-values-by-group-in-r And this works on Date() formatted values as well: http://stackoverflow.com/questions/30378946/calculating-time-difference-by-id – Brian D Nov 10 '16 at 20:35
You could simply subtract a data.frame consisting of rows 1:(n-1)
of the original data.frame from a second one consisting of rows 2:n
. (Here n
is the number of rows in the original data.frame):
# Example data
df <- data.frame(a=1:4, b=4:1, c=11:14, d=c(2,4,10,0))
# a b c d
# 1 1 4 11 2
# 2 2 3 12 4
# 3 3 2 13 10
# 4 4 1 14 0
# Calculate the differences
diff_df <- df[-1,] - df[-nrow(df),]
diff_df
# a b c d
# 2 1 -1 1 2
# 3 1 -1 1 6
# 4 1 -1 1 -10
You can then rename the rows as you see fit using something like:
row.names(diff_df) <- paste("d", seq_len(nrow(diff_df)), sep="")
diff_df
# a b c d
# d1 1 -1 1 2
# d2 1 -1 1 6
# d3 1 -1 1 -10

- 159,210
- 26
- 366
- 455
-
Did you mean to do the same thing, but with rows? I mean, it's the same idea, but the OP was talking about rows. – joran Dec 06 '11 at 18:26
-
Could also use the `head` and `tail` functions like `tail(df, -1) - head(df, -1)`. – lmo Mar 27 '17 at 13:28
We can also use dplyr::lag
and data.table::shift
library(dplyr)
df <- mtcars
df %>% mutate(diff_row = mpg - lag(mpg))
# mpg cyl disp hp drat wt qsec vs am gear carb diff_row
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 NA
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 0.0
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 1.8
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 -1.4
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 -2.7
#6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 -0.6
#7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 -3.8
#....
As there is no lag
for first row, the default value is NA
. If we want the first value returned to be 0 instead, we can set the default value to first
value of the variable so the subtracting yields to 0.
df %>% mutate(diff_row = mpg - lag(mpg, default = first(mpg)))
# mpg cyl disp hp drat wt qsec vs am gear carb diff_row
#1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 0.0
#2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 0.0
#3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 1.8
#4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 -1.4
#5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 -2.7
#6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 -0.6
#7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 -3.8
#....
The same process can also be done using shift
from data.table
whose default type
is lag
library(data.table)
setDT(df)[, diff_row := mpg - shift(mpg)]
Or to get 0 value in the first row, here we use fill
argument.
setDT(df)[, diff_row := mpg - shift(mpg, fill = first(mpg))]

- 377,200
- 20
- 156
- 213
This would have be clearer with sample data. Let's assume you mean "numerical difference", and your data can be represented as a matrix, this would do.
set.seed(4871)
m = matrix(sample(1:5,50,TRUE),nrow=10,ncol=5)
m
t(apply(m,1,diff))

- 10,076
- 44
- 67