1

I have a dataframe like

  ID_CASE   Month   
CS00000026A 201301  
CS00000026A 201302  
CS00000026A 201303  
CS00000026A 201304  
CS00000026A 201305  
CS00000026A 201306  
CS00000026A 201307  
CS00000026A 201308  
CS00000026A 201309  
CS00000026A 201310  
CS00000191C 201302  
CS00000191C 201303  
CS00000191C 201304  
CS00000191C 201305  
CS00000191C 201306  
CS00000191C 201307  
CS00000191C 201308  
CS00000191C 201309  
CS00000191C 201310  

I want the final data frame to have three additional column like

  ID_CASE   Month   Lag_1   Lag_2   Lag_3
CS00000026A 201301  NA      NA      NA
CS00000026A 201302  201301  NA      NA
CS00000026A 201303  201202  201201  NA
CS00000026A 201304  201203  201202  201201
CS00000026A 201305  201204  201203  201202
CS00000026A 201306  201305  201304  201303
CS00000026A 201307  201306  201305  201304
CS00000026A 201308  201307  201306  201305
CS00000026A 201309  201308  201307  201306
CS00000026A 201310  201309  201308  201307
CS00000191C 201302  NA       NA     NA
CS00000191C 201303  201302   NA     NA
CS00000191C 201304  201303  201302      NA
CS00000191C 201305  201304  201303  201302
CS00000191C 201306  201305  201304  201303
CS00000191C 201307  201306  201305  201304
CS00000191C 201308  201307  201306  201305
CS00000191C 201309  201308  201307  201306
CS00000191C 201310  201309  201308  201307

where

  • Lag_1 is lagged by 1 Month
  • Lag_2 is lagged by 2 Months
  • Lag_3 is lagged by 3 Months.

I have used the following code to atleast get Lag_1

df <- ddply(df,.(ID_CASE),transform,
                  Lag_1 <- c(NA,Month[-nrow(df)])) 

But this does not give me the desired output for Lag_1.

I have also tried looking at the solutions in Lag in R dataframe

And how can this be done if I have a date object instead of an int column 'Month' as in the current example?

Any help on this will be appreciated.

Community
  • 1
  • 1
darkage
  • 857
  • 3
  • 12
  • 22

4 Answers4

2

Try data.table

library(data.table)
setDT(df)[, `:=` (Lag_1 = c(NA, Month[-.N]),
                  Lag_2 = c(rep(NA, 2), Month[-.N]),
                  Lag_3 = c(rep(NA, 3), Month[-.N])), by = ID_CASE]
df
#         ID_CASE  Month  Lag_1  Lag_2  Lag_3
#  1: CS00000026A 201301     NA     NA     NA
#  2: CS00000026A 201302 201301     NA     NA
#  3: CS00000026A 201303 201302 201301     NA
#  4: CS00000026A 201304 201303 201302 201301
#  5: CS00000026A 201305 201304 201303 201302
#  6: CS00000026A 201306 201305 201304 201303
#  7: CS00000026A 201307 201306 201305 201304
#  8: CS00000026A 201308 201307 201306 201305
#  9: CS00000026A 201309 201308 201307 201306
# 10: CS00000026A 201310 201309 201308 201307
# 11: CS00000191C 201302     NA     NA     NA
# 12: CS00000191C 201303 201302     NA     NA
# 13: CS00000191C 201304 201303 201302     NA
# 14: CS00000191C 201305 201304 201303 201302
# 15: CS00000191C 201306 201305 201304 201303
# 16: CS00000191C 201307 201306 201305 201304
# 17: CS00000191C 201308 201307 201306 201305
# 18: CS00000191C 201309 201308 201307 201306
# 19: CS00000191C 201310 201309 201308 201307
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
1

You may use lag.zoo, where k can be a vector of lags.

library(plyr)
library(zoo)

ddply(df, .(ID_CASE), function(x){
  z <- zoo(x$Month)
  lag(z, k = 0:-3)
})

#        ID_CASE   lag0  lag-1  lag-2  lag-3
# 1  CS00000026A 201301     NA     NA     NA
# 2  CS00000026A 201302 201301     NA     NA
# 3  CS00000026A 201303 201302 201301     NA
# 4  CS00000026A 201304 201303 201302 201301
# 5  CS00000026A 201305 201304 201303 201302
# 6  CS00000026A 201306 201305 201304 201303
# 7  CS00000026A 201307 201306 201305 201304
# 8  CS00000026A 201308 201307 201306 201305
# 9  CS00000026A 201309 201308 201307 201306
# 10 CS00000026A 201310 201309 201308 201307
# 11 CS00000191C 201302     NA     NA     NA
# 12 CS00000191C 201303 201302     NA     NA
# 13 CS00000191C 201304 201303 201302     NA
# 14 CS00000191C 201305 201304 201303 201302
# 15 CS00000191C 201306 201305 201304 201303
# 16 CS00000191C 201307 201306 201305 201304
# 17 CS00000191C 201308 201307 201306 201305
# 18 CS00000191C 201309 201308 201307 201306
# 19 CS00000191C 201310 201309 201308 201307

Edit following comment.

If there are groups with only one date, the code above will generate an error. A small example:

df <- data.frame(ID_CASE = c(1, 1, 1, 2), Month = 1:4)
df
#   ID_CASE Month
# 1       1     1
# 2       1     2
# 3       1     3
# 4       2     4

ddply(df, .(ID_CASE), function(x){
  z <- zoo(x$Month)
  lag(z, k = 0:-3)
})

# Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, id_as_factor) : 
#   Results do not have equal lengths

This is due to the 'one registration-only groups' are coerced to a univariate time series. To avoid such coercion, use [ subsetting and drop = FALSE

ddply(df, .(ID_CASE), function(x){
  z <- zoo(x[ , "Month", drop = FALSE])
  lag(z, k = 0:-3)
})

#   ID_CASE Month.lag0 Month.lag-1 Month.lag-2 Month.lag-3
# 1       1          1          NA          NA          NA
# 2       1          2           1          NA          NA
# 3       1          3           2           1          NA
# 4       2          4          NA          NA          NA
Henrik
  • 65,555
  • 14
  • 143
  • 159
  • This doesnt seem to be exactly the same result as in the desired output in the question. see the difference in lag1 in row 11 – talat May 12 '14 at 08:49
  • @beginneR, Thank you for your comment! Well, then it seems like I misunderstood the underlying logic of the outcome of the lagging. – Henrik May 12 '14 at 08:55
  • @beginneR the output is fine beacuse corresponding to ID_CASE= "CS00000191C" there is no other month prior to 201302 – darkage May 12 '14 at 08:57
  • @darkage well, i think it depends on what the OP actually wants to achieve. He might want to get the previous month even if it was not in the data before, i assume. At least that's what he presented in his question – talat May 12 '14 at 09:00
  • @Henrik please make the necessary edits to the question – darkage May 12 '14 at 09:01
  • @Henrik I am getting an error which says `Error in list_to_dataframe(res, attr(.data, "split_labels")) : Results do not have equal lengths` – darkage May 12 '14 at 09:02
  • @Henrik what is the structure of Month in your code? – darkage May 12 '14 at 09:11
  • It might be due to groups with only one date. – Henrik May 12 '14 at 09:11
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/52508/discussion-between-darkage-and-henrik) – darkage May 12 '14 at 09:13
  • @Henrik what is the way out? – darkage May 12 '14 at 09:17
1

From data.table v1.9.6 you can use shift():

require(data.table)
setDT(df)[, paste("lag", 1:3, sep="_") := shift(Month, 1:3), by=ID_CASE]
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Arun
  • 116,683
  • 26
  • 284
  • 387
0

using dplyr:

library(dplyr)

 df %.%
  group_by(ID_CASE) %.%
  mutate(lag_1 = lag(Month, 1),
         lag_2 = lag(Month, 2),
         lag_3 = lag(Month, 3))
Miha Trošt
  • 2,002
  • 22
  • 25