0

I have a question regarding for loops and rolling average where I would want the current day plus the 6 previous days estimates. Currently, I have a for loop that calculates the daily number of new people. But what I want is to have a rolling average as I discussed previously. Any help would be appreciated. thanks!

The data looks like this:

dataframe = d

               date            place    total
               2020-01-10       A         10
               2020-01-11       A          6
               2020-01-12       A          8
               2020-01-13       A          5
               2020-01-14       A          7
               2020-01-15       A          6
               2020-01-16       A          9
               2020-01-17       A          10
               2020-01-10       B          11
               2020-01-20       B          61
               2020-01-21       B          82
               2020-01-22       B          53
               2020-01-23       B          74
               2020-01-24       B          65
               2020-01-25       B          96
               2020-01-27       B          100

The for loop I wrote to calculate the number of new people per day is:

for(x in unique(d$place)) {
  region <- d[d$place == x,]
  n <- nrow(region)
  
  for(i in 1:n-1) {
    region$newpeople[i]<-region$total[i]-region$total[i+1]
  }
  region$newpeople[n]<-region$total[n]
}

I then append the estimates to the associated daily date. I would want something similar to the rolling average from the past 7 days.

date_range <- seq(region$date[1], region$date[n], by = "days")
y <- paste(region$date, collapse = "|")
missing_dates <- date_range[!grepl(y, date_range)]

if (length(missing_dates) != 0) {
  date <- missing_dates
  place <- paste0(region$place[1])
  total<- NA
  newpeople <- NA
  
  df <- data.frame(date, place, total, newpeople)
  region <- rbind(region, df) %>%
    arrange(date)
}

Any help would be appreciated!

Cainã Max Couto-Silva
  • 4,839
  • 1
  • 11
  • 35
Dr.E77
  • 107
  • 6

1 Answers1

1

I'm not sure if you're totally set on using for loops.

Data

d <- read.table(text = "
               date            place    total
               2020-01-10       A         10
               2020-01-11       A          6
               2020-01-12       A          8
               2020-01-13       A          5
               2020-01-14       A          7
               2020-01-15       A          6
               2020-01-16       A          9
               2020-01-17       A          10
               2020-01-10       B          11
               2020-01-20       B          61
               2020-01-21       B          82
               2020-01-22       B          53
               2020-01-23       B          74
               2020-01-24       B          65
               2020-01-25       B          96
               2020-01-27       B          100
               ",
               header = TRUE)

Attempts

This post and website are pretty helpful. So using the mean_run() function from the runner package, we get

# install.packages("runner")

d %>%
  group_by(place) %>%
  arrange(date, .by_group = TRUE) %>%
  mutate(
    # Difference between days
    diff = total - lag(total),
    # Rolling average of past seven days
    rolling_7 = runner::mean_run(
      x = total, 
      k = 7,
      idx = as.Date(date)
      )
    )

I'm not sure if this is what you're looking for though. For example, when looking at the rolling average for 2020-01-27, the code recognizes that you didn't have data for 2020-01-26, so it skips it. Therefore, the rolling average for 2020-01-27 is 78.3 = (82 + 53 + 74 + 65 + 96 + 100) / 6.

tonybot
  • 643
  • 2
  • 10
  • Thanks for this. I'll give this a try. The reason I do for loops is that my dataset is actually really huge and after the for loops, I calculate another estimate not provided here. But I think I may be able to use this after the fact where if I output an excel file and re-read it back I can calculate the rolling average then. But if you have any suggestions about my for loop, it would be much appreciated! – Dr.E77 Nov 17 '20 at 20:35