2

Let's say I created the following data frame in R

c1 <- sample(10)
c2 <- sample(10)
c3 <- sample(10)
df1 <- data.frame(c1, c2, c3)

I would like to create new data frame that takes the difference between the current row and previous row of the df1.

Of course, I can create it manually as following:

c4 <- df1$c1[2:nrow(df1)]-df1$c1[1:(nrow(df1)-1)]
c5 <- df1$c2[2:nrow(df1)]-df1$c2[1:(nrow(df1)-1)]
c6 <- df1$c3[2:nrow(df1)]-df1$c3[1:(nrow(df1)-1)]
df2 <- data.frame(c4, c5, c6)

but instead of having to define them one by one, I was wondering if there are more efficient ways of creating the columns.

Also, if there's a way, if I wanted to "select" certain columns to take difference, is there a fast way of doing so once I have the list of column names?

user98235
  • 830
  • 1
  • 13
  • 31

2 Answers2

2

We loop through the columns, get the lag with shift and subtract it from the original value. We converted the 'data.frame' to 'data.table' (setDT(df1)).

library(data.table)
setnames(setDT(df1)[, lapply(.SD, function(x) (x- shift(x))[-1])], paste0("c", 4:6))[]

Or using dplyr

library(dplyr)
df1 %>%
    mutate_each(funs(. - lag(.))) %>%
    na.omit()

Or a base R option is

tail(df1,-1) - head(df1,-1)

Or another option is

sapply(df1, diff)

However, diff would be slower compared to subtracting directly or using the shift (as the OP's post concerns performance)

akrun
  • 874,273
  • 37
  • 540
  • 662
1

You can use diff and apply it for all columns

apply(df1, 2, diff)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213