I work a lot with large 3D array (latitude, longitude, and time), with size of for example 720x1440x480. Usually, I need to make operations over time for each latitude and longitude, for example, getting the average (resulting in a 2D array) or getting a rolling mean in time (resulting in a 3D array), or more complex functions.
My question is: which is the package (or way do it) the most efficient and fast?
I know one option is base R, using the apply function and for rolling function mixed with the package zoo that provides the rollapply function. Another way is with tidyverse and another way is with data.table. And combinations between these packages. But is there one that is the fastest?
For example if I have this cube of data:
data <- array(rnorm(721*1440*480),dim = c(721,1440,480))
Which dimensions are latitude, longitude, and time like this:
lat <- seq(from = -90, to = 90, by = 0.25)
lon <- seq(from = 0, to = 359.75, by = 0.25)
time <- seq(from = as.Date('1980-01-01'), by = 'month', length.out = 480)
And I usually need to do things like this (this is in base R + zoo):
# Average in time
average_data <- apply(data, 1:2, mean)
# Rolling mean, width of window = 3
library(zoo)
rolling_mean <- function(x){
return(rollapply(data = x, width = 3, by = 1, FUN = mean))
}
rolling_mean_data <- apply(X = data, MARGIN = 1:2,
FUN = rolling_mean)
rolling_mean_data <- aperm(a = rolling_mean_data,perm = c(2,3,1))
The functions may change being not necessary always mean, also could be others statistics like standard deviation or correlation with a time series.
So, which is the fastest way to do this type of calculus?