0

I am trying to created a 7 day lag difference by group. So, I am trying to replicate the code below and hope to get a similar result with a 7 day lag instead.

library(dplyr)

dat %>% mutate(dx=c(NA, diff(x)), dy=c(NA, diff(y)))

   x y dx dy
 1 5 3 NA NA
 2 8 9  3  6
 3 3 1 -5 -8
 4 1 5 -2  4

But I am getting the error messages:

Error: incompatible size (900), expecting 905 (the group size) or 1

is there a quick and easy way to fix this error. I undertstand it might have to do with mutate

Jaap
  • 81,064
  • 34
  • 182
  • 193
Alice Work
  • 185
  • 1
  • 2
  • 10
  • Please show your input example. Based on the output, there is no `colname1, colname2` – akrun Jul 13 '16 at 19:38
  • Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. – Jaap Jul 13 '16 at 19:50
  • thanks for the comment. The only difference between my code and the one above is the group_by. so dat %>% group_by(anything) %>% mutate(dx=c(NA, diff(x)), dy=c(NA, diff(y))). I know it is an easy fix, I just don't know how to fix it since I am fairly new to r. – Alice Work Jul 13 '16 at 20:08

2 Answers2

3

You need to pad with NA for the number of days in your lag. Just as you need 1 NA to pad for the lack of a difference value for the first row with a lag of 1, now you need 7 NA to pad for the lack of a difference value for the first 7 rows. Example with built-in mtcars data frame:

mtcars %>% 
  mutate(dx = c(NA, diff(mpg)),
         dx7 = c(rep(NA,7), diff(mpg, 7)))

Or with grouping:

mtcars %>% 
  group_by(am) %>%
  mutate(dx = c(NA, diff(mpg)),
         dx7 = c(rep(NA,7), diff(mpg, 7)))

@Axeman's nice answer reminded me that you can also use the zoo package's version of diff, which has built-in padding. You just have to convert your vector to a zoo object so that the diff.zoo method will get dispatched, instead of base R diff, making na.pad available:

library(zoo) 

mtcars %>% 
  mutate(dx = diff(zoo(mpg), na.pad=TRUE),
         dx7 = diff(zoo(mpg), 7, na.pad=TRUE))
eipi10
  • 91,525
  • 24
  • 209
  • 285
1

I would suggest getting rid of the diff altogether, and use dplyrs very own lag. This takes care of the needed NA's.

mtcars %>% 
  mutate(dx = mpg - lag(mpg),
         dx7 = mpg - lag(mpg, 7))
Axeman
  • 32,068
  • 8
  • 81
  • 94