9

I have a group of data in the format:

ID    Minutes Value
xxxx  118     3 
xxxx  121     4 
xxxx  122     3 
yyyy  122     6 
xxxx  123     4 
yyyy  123     8 
...   ...     .... 

Each ID is a patient and each value is, say, blood pressure for that minute. I would like to create a rolling average for the 60 minutes before and 60 minutes after each point. However - as you can see, there are missing minutes (so I cannot merely use row numbers) and I would like to create average for each unique ID (so the average for ID xxxx cannot include values assigned to ID yyyy). It sounds like rollapply or rollingstat might be options, but have had little success trying to piece this together...

Please let me know if further clarity is needed.

Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
user3239045
  • 91
  • 1
  • 2
  • 1
    How large is your data? One quick and dirty way to handle this is to simply insert NAs for the missing minutes – Ricardo Saporta Jan 27 '14 at 03:51
  • 1
    If you don't want to use `data.table` , you could set up a loop. Roughly, `for (jtime in 1:N) mean(DF[DF$Minutes > (jtime-60) & DF$Minutes < (jtime + 60),3])` – Carl Witthoft Jan 27 '14 at 12:44
  • readers interested in such feature added to data.table rolling functions are kindly requested to upvote this FR: https://github.com/Rdatatable/data.table/issues/3241 – jangorecki Sep 01 '19 at 12:36

2 Answers2

11

You can easily fill in the missing Minutes (Value will be set to NA), then use rollapply

library(data.table)
library(zoo)

## Convert to data.table
DT <- data.table(DF, key=c("IDs", "Minutes"))

## Missing Minutes will be added in. Value will be set to NA. 
DT <- DT[CJ(unique(IDs), seq(min(Minutes), max(Minutes)))]

## Run your function
DT[, rollapply(value, 60, mean, na.rm=TRUE), by=IDs]

Alternatively, you don't need to keep the 'padded' Minutes / NA Values:

You can do it all in one shot:

## Convert your DF to a data.able
DT <- data.table(DF, key=c("IDs", "Minutes"))

## Compute rolling means, with on-the-fly padded minutes
DT[ CJ(unique(IDs), seq(min(Minutes), max(Minutes))) ][, 
  rollapply(value, 60, mean, na.rm=TRUE), by=IDs]
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
5

An alternative approach that uses tidyr/dplyr instead of data.table and RcppRoll instead of zoo:

library(dplyr)
library(tidyr)
library(RcppRoll)

d %>% 
  group_by(ID) %>%
  # add rows for unosberved minutes
  complete(Minutes = full_seq(Minutes, 1)) %>%
  # RcppRoll::roll_mean() is written in C++ for speed 
  mutate(moving_mean = roll_mean(Value, 131, fill = NA, na.rm = TRUE)) %>%
  # keep only the rows that were originally observed
  filter(!is.na(Value))

data

d <- data_frame(
  ID = rep(1:3, each = 5),
  Minutes = rep(c(1, 30, 60, 120, 200), 3),
  Value = rpois(15, lambda = 10)
)
davechilders
  • 8,693
  • 2
  • 18
  • 18