1

In the data frame dataAlog with variables 'CashInd','ACCOUNT', 'DateTime' and 'Cnt' (which is 1 for all observations). Aggregate count within 4 days for each ACCOUNT and CashInd. I tried using for loop to deal with the issue while it is slow. I am thinking to use apply function instead.

for (i in 1:dim(dataAlog)[1]) {
  data <- subset(dataAlog, (dataAlog$CashInd==dataAlog$CashInd[i] & dataAlog$ACCOUNT==dataAlog$ACCOUNT[i]) & (dataAlog$DateTime<=dataAlog$DateTime[i]) & (dataAlog$DateTime >= as.POSIXct(as.Date(dataAlog$DateTime[i])-4)) )
  if (dim(data)[1]==0) {
    dataAlog$day4Count[i] <- 1
  }
  else{
    agg <- aggregate(cnt ~ ACCOUNT, data = data, FUN=sum)
    dataAlog$day4Count[i] <- agg$cnt
  }
}
useR
  • 3,062
  • 10
  • 51
  • 66
  • Do you have an example `dataAlog` dataframe that you used, and the output you're expecting? It would be helpful for testing your for loop. – MBorg Mar 29 '18 at 12:16

1 Answers1

1

Here is a suggested solution using data.table. You can use a rolling window of 4 and count the number of dates that is after 4 days ago from the latest date in the window

data:

library(zoo)
library(data.table)

#generate sample data
set.seed(0L)
N <- 20
dat <- data.table(CashInd=rep(1:2, N),
    ACCOUNT=rep(LETTERS[1:2], each=N),
    DateTime=sample(seq(Sys.Date()-2*N, Sys.Date(), "1 day"), N),
    Amt=rnorm(2*N))
#it is impt that the data is sorted
setorder(dat, CashInd, ACCOUNT, DateTime)

calculations:

#roll with window of 4 and calculate the number of dates after the date 4 days ago of latest date
nDaysAgo <- 4
dat[, c("COUNT", "CumuAmt") := {
        list(sapply(DateTime, function(day) sum(between(DateTime, day-nDaysAgo, day))),
            sapply(DateTime, function(day) sum(Amt[between(DateTime, day-nDaysAgo, day)])))
    },
    by = c("CashInd", "ACCOUNT")]

results:

#     CashInd ACCOUNT   DateTime         Amt COUNT     CumuAmt
#  1:       1       A 2018-02-24 -0.22426789     1 -0.22426789
#  2:       1       A 2018-02-28  0.13333636     2 -0.09093152
#  3:       1       A 2018-03-07  1.08576936     1  1.08576936
#  4:       1       A 2018-03-09 -1.14765701     2 -0.06188765
#  5:       1       A 2018-03-17 -1.28459935     1 -1.28459935
#  6:       1       A 2018-03-26  0.25222345     1  0.25222345
#  7:       1       A 2018-03-28 -0.29921512     2 -0.04699167
#  8:       1       A 2018-03-31  0.76359346     2  0.46437834
#  9:       1       A 2018-04-01  0.43568330     3  0.90006164
# 10:       1       A 2018-04-03 -0.05710677     3  1.14216999
# 11:       1       B 2018-02-24  1.75790309     1  1.75790309
# 12:       1       B 2018-02-28 -0.45278397     2  1.30511912
# 13:       1       B 2018-03-07 -1.56378205     1 -1.56378205
# 14:       1       B 2018-03-09 -0.43331032     2 -1.99709237
# 15:       1       B 2018-03-17  0.83204713     1  0.83204713
# 16:       1       B 2018-03-26  0.99216037     1  0.99216037
# 17:       1       B 2018-03-28  0.72675075     2  1.71891111
# 18:       1       B 2018-03-31 -0.23570656     2  0.49104419
# 19:       1       B 2018-04-01  1.23830410     3  1.72934829
# 20:       1       B 2018-04-03 -1.16657055     3 -0.16397300
# 21:       2       A 2018-03-01  0.37739565     1  0.37739565
# 22:       2       A 2018-03-02 -0.41151083     2 -0.03411519
# 23:       2       A 2018-03-03  0.04672617     3  0.01261099
# 24:       2       A 2018-03-05 -0.79900925     4 -0.78639826
# 25:       2       A 2018-03-12 -0.69095384     1 -0.69095384
# 26:       2       A 2018-03-14  0.80418951     2  0.11323567
# 27:       2       A 2018-03-15 -1.23753842     3 -1.12430275
# 28:       2       A 2018-03-16 -0.28946157     4 -1.41376433
# 29:       2       A 2018-03-27 -0.89192113     1 -0.89192113
# 30:       2       A 2018-03-29  0.50360797     2 -0.38831316
# 31:       2       B 2018-03-01  0.56074609     1  0.56074609
# 32:       2       B 2018-03-02  1.15191175     2  1.71265784
# 33:       2       B 2018-03-03 -0.22732869     3  1.48532915
# 34:       2       B 2018-03-05 -0.54288826     4  0.94244090
# 35:       2       B 2018-03-12  1.15653700     1  1.15653700
# 36:       2       B 2018-03-14 -0.83204330     2  0.32449370
# 37:       2       B 2018-03-15 -0.27934628     3  0.04514742
# 38:       2       B 2018-03-16 -0.64947165     4 -0.60432423
# 39:       2       B 2018-03-27 -0.42951311     1 -0.42951311
# 40:       2       B 2018-03-29 -1.06559058     2 -1.49510369
#     CashInd ACCOUNT   DateTime         Amt COUNT     CumuAmt

see also solution 2 in Cumulative sum in a window (or running window sum) based on a condition in R

chinsoon12
  • 25,005
  • 4
  • 25
  • 35
  • Thanks for your help! It works. What if i have a field say Amt, how can i sum the cumulative amount? Sorry that i m new to data.table. Thanks again – useR Apr 04 '18 at 07:41
  • 1
    @useR, i updated with your cumulative amount. Can you check if that is what you are looking for? – chinsoon12 Apr 04 '18 at 07:51
  • HI @Chinsoon Thanks for your prompt response. While i would like to calculate the cumulative sum within CashInd, ACCOUNT and 4 days windows. THanks again – useR Apr 04 '18 at 07:58
  • 1
    @userR, updated to a more readable format and add the reference link – chinsoon12 Apr 04 '18 at 09:03
  • Thanks @chinsoon ! – useR Apr 04 '18 at 09:22
  • Sorry @chinsoon there is one issue that if there are more than 1 txn for the account within a date, it cannot add 1 to the count / cumsum – useR Apr 04 '18 at 09:33
  • how do you decide which to choose if there are duplicates? maybe u would like to post a new qn? – chinsoon12 Apr 04 '18 at 09:35
  • sorry that there is a transaction time actually. – useR Apr 04 '18 at 09:40