Creating columns of differences faster in R

Question

Let's say I created the following data frame in R

c1 <- sample(10)
c2 <- sample(10)
c3 <- sample(10)
df1 <- data.frame(c1, c2, c3)

I would like to create new data frame that takes the difference between the current row and previous row of the df1.

Of course, I can create it manually as following:

c4 <- df1$c1[2:nrow(df1)]-df1$c1[1:(nrow(df1)-1)]
c5 <- df1$c2[2:nrow(df1)]-df1$c2[1:(nrow(df1)-1)]
c6 <- df1$c3[2:nrow(df1)]-df1$c3[1:(nrow(df1)-1)]
df2 <- data.frame(c4, c5, c6)

but instead of having to define them one by one, I was wondering if there are more efficient ways of creating the columns.

Also, if there's a way, if I wanted to "select" certain columns to take difference, is there a fast way of doing so once I have the list of column names?

Just `df1[-1, ] - df1[-nrow(df1), ]` – David Arenburg Aug 05 '16 at 08:09 — David Arenburg, Aug 05 '16 at 08:09

akrun · Accepted Answer · 2016-08-05T08:14:30.260

We loop through the columns, get the lag with shift and subtract it from the original value. We converted the 'data.frame' to 'data.table' (setDT(df1)).

library(data.table)
setnames(setDT(df1)[, lapply(.SD, function(x) (x- shift(x))[-1])], paste0("c", 4:6))[]

Or using dplyr

library(dplyr)
df1 %>%
    mutate_each(funs(. - lag(.))) %>%
    na.omit()

Or a base R option is

tail(df1,-1) - head(df1,-1)

Or another option is

sapply(df1, diff)

However, diff would be slower compared to subtracting directly or using the shift (as the OP's post concerns performance)

score 1 · Answer 2 · answered Aug 05 '16 at 08:13

1

You can use diff and apply it for all columns

apply(df1, 2, diff)

answered Aug 05 '16 at 08:13

Ronak Shah

377,200
20
156
213

Creating columns of differences faster in R

2 Answers2