0

any help would be appreciated!

Essentially, I need a variable that sums the number of previous observations by group while taking account of the date variable.

For Example:

my current data:

ID <- c("A", "A", "A","A", "B", "B", "B") 
YEAR <- c(1900, 1901, 1902, 1903, 1900, 1901, 1902) 
CASH <- c(1, 2, 3, 1, 0, 1, 0) 
DF <- data.frame(ID, YEAR, CASH) 
print(DF)

what I would like my data to look like:

ID <- c("A", "A", "A","A", "B", "B", "B") 
YEAR <- c(1900, 1901, 1902, 1903, 1900, 1901, 1902) 
CASH <- c(1, 2, 3, 1, 0, 1, 0)
PREV_CASH <- c(NA, 1, 3, 6, NA, NA, 1)
DF2 <- data.frame(ID, YEAR, CASH, PREV_CASH)
print(DF2)

I would like to sum the amount of previous cash from the prior year for each group.

KMR
  • 31
  • 7
  • Possible duplicate of [How to create a lag variable within each group?](http://stackoverflow.com/questions/26291988/how-to-create-a-lag-variable-within-each-group) – alistaire Jan 31 '17 at 05:09
  • A base R alternative, `ave(DF$CASH, DF$ID, FUN=function(i) c(NA, head(cumsum(i), -1)))` except that the penultimate observation is 0 rather than NA. – lmo Jan 31 '17 at 13:25

1 Answers1

0

We can use lag of the cumsum of 'CASH' after grouping by 'ID'

library(dplyr)
DF %>%
    group_by(ID) %>%
    mutate(PREV_CASH = lag(cumsum(CASH)), PREV_CASH = replace(PREV_CASH, PREV_CASH==0, NA))
#       ID  YEAR  CASH PREV_CASH 
#    <fctr> <dbl> <dbl>     <dbl>
#1      A  1900     1        NA
#2      A  1901     2         1
#3      A  1902     3         3
#4      A  1903     1         6
#5      B  1900     0        NA
#6      B  1901     1        NA
#7      B  1902     0         1
akrun
  • 874,273
  • 37
  • 540
  • 662