Update for "tumbling window" (top, here) versus "rolling window" (below, kept for posterity/reference). Still using the dat
from the top of the rolling-window discussion.
The discussion about #VALUE!
(far below) may still be relevant, I'll include the code here.
base R
dat$speed <- suppressWarnings(as.numeric(dat$speed))
aggregate(dat$speed, list(grp = (seq_len(nrow(dat)) - 1) %/% 5), FUN = median, na.rm = TRUE)
# grp x
# 1 0 3.4245
# 2 1 -4.9730
dplyr
library(dplyr)
dat %>%
mutate(speed = suppressWarnings(as.numeric(speed))) %>%
group_by(grp = (seq_len(n()) - 1) %/% 5) %>%
summarize(med5 = median(speed, na.rm = TRUE))
# # A tibble: 2 x 2
# grp med5
# <dbl> <dbl>
# 1 0 3.42
# 2 1 -4.97
data.table
library(data.table)
as.data.table(dat)[, speed := suppressWarnings(as.numeric(speed))
][, .(med5 = median(speed, na.rm = TRUE)), by = .(grp = (seq_len(nrow(dat)) - 1) %/% 5)][]
# grp med5
# <num> <num>
# 1: 0 3.4245
# 2: 1 -4.9730
(This below is for rolling-windows, no longer relevant other than the definition of dat
data.)
I copied the first 10 rows from that single-column frame, and get
dat <- structure(list(speed = c(0, 5.534, 1.315, 7.6865, -0.479, -0.4605, -4.311, -4.973, -7.69, -11.669)), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")
With this, the basic operation is something like this:
newvec <- zoo::rollmedian(dat$speed, 5)
newvec
# [1] 1.3150 1.3150 -0.4605 -0.4790 -4.3110 -4.9730
Note that this creates nrow(dat) - (k - 1)
(20) values, where k=5
(your window). In general, rolling operations tend to be reductive, but we have options to change that if needed. For instance, we can keep it the same length and pad the ends with NA
; to do this, we can "align" the window left, center, or right:
zoo::rollmedian(dat$speed, 5, fill = NA, align = "left")
# [1] 1.3150 1.3150 -0.4605 -0.4790 -4.3110 -4.9730 NA NA NA NA
zoo::rollmedian(dat$speed, 5, fill = NA, align = "center")
# [1] NA NA 1.3150 1.3150 -0.4605 -0.4790 -4.3110 -4.9730 NA NA
zoo::rollmedian(dat$speed, 5, fill = NA, align = "right")
# [1] NA NA NA NA 1.3150 1.3150 -0.4605 -0.4790 -4.3110 -4.9730
Let me show how alignment fits here. With align="center"
(the default), we see:
dat$speed
# [1] 0.0000 5.5340 1.3150 7.6865 -0.4790 -0.4605 -4.3110 -4.9730 -7.6900 -11.6690
### `----------------------------------------'
### take the median of these values,
### and then assign that single value here:
### /
### ,---------------'
### /
# [1] NA NA 1.3150 1.3150 -0.4605 -0.4790 -4.3110 -4.9730 NA NA
dat$speed[1:5]
# [1] 0.0000 5.5340 1.3150 7.6865 -0.4790
median(dat$speed[1:5])
# [1] 1.315
For the next value,
# [1] 0.0000 5.5340 1.3150 7.6865 -0.4790 -0.4605 -4.3110 -4.9730 -7.6900 -11.6690
### `----------------------------------------'
### take the median of these values,
### and then assign that single value here:
### /
### ,---------------'
### /
# [1] NA NA 1.3150 1.3150 -0.4605 -0.4790 -4.3110 -4.9730 NA NA
dat$speed[2:6]
# [1] 5.5340 1.3150 7.6865 -0.4790 -0.4605
median(dat$speed[2:6])
# [1] 1.315
So we can assign that easily to a new frame with data.frame(rollmed = newvec)
, with or without padding. If you want to assign it back to the original frame, one can do:
dat$rollmed <- zoo::rollmedian(dat$speed, 5, fill = NA, align = "center")
dat
# speed rollmed
# 1 0.0000 NA
# 2 5.5340 NA
# 3 1.3150 1.3150
# 4 7.6865 1.3150
# 5 -0.4790 -0.4605
# 6 -0.4605 -0.4790
# 7 -4.3110 -4.3110
# 8 -4.9730 -4.9730
# 9 -7.6900 NA
# 10 -11.6690 NA
As for your #VALUE!
, that likely manifests as a character
column instead of numeric
, in which case you have an extra step before all of the above.
I'll change one of my values to that error:
dat$speed[5] <- "#VALUE!"
dat
# speed
# 1 0
# 2 5.534
# 3 1.315
# 4 7.6865
# 5 #VALUE!
# 6 -0.4605
# 7 -4.311
# 8 -4.973
# 9 -7.69
# 10 -11.669
str(dat)
# 'data.frame': 10 obs. of 1 variable:
# $ speed: chr "0" "5.534" "1.315" "7.6865" ...
(See? chr
.)
From here, we can simply convert all to numbers, ignoring the warnings we get:
dat$speed <- suppressWarnings(as.numeric(dat$speed))
dat
# speed
# 1 0.0000
# 2 5.5340
# 3 1.3150
# 4 7.6865
# 5 NA
# 6 -0.4605
# 7 -4.3110
# 8 -4.9730
# 9 -7.6900
# 10 -11.6690
str(dat)
# 'data.frame': 10 obs. of 1 variable:
# $ speed: num 0 5.53 1.31 7.69 NA ...
And from here, we can do the rolling median again. Note that the NA
we now have changes things slightly:
dat$rollmed <- zoo::rollmedian(dat$speed, 5, fill = NA, align = "center")
dat$rollmed2 <- zoo::rollmedian(dat$speed, 5, fill = NA, align = "center", na.rm = TRUE)
dat
# speed rollmed rollmed2
# 1 0.0000 NA NA
# 2 5.5340 NA NA
# 3 1.3150 NA 3.42450
# 4 7.6865 NA 3.42450
# 5 NA NA 0.42725
# 6 -0.4605 NA -2.38575
# 7 -4.3110 NA -4.64200
# 8 -4.9730 -4.973 -4.97300
# 9 -7.6900 NA NA
# 10 -11.6690 NA NA
The default (what we did before) will return an NA
median for every row within +/- 4 rows (k-1
) of the previous #VALUE!
; we can add na.rm=TRUE
if you would prefer; that is not a rolling-window thing, that's a general-statistics question, "are empty values an issue".