I am working with multidimensional array both on R and MATLAB, these arrays have five dimensions (total of 14.5M of elements). I have to remove a dimension applying an arithmetic mean on it and I discovered an amazing difference of performances using the two softwares.
MATLAB:
>> a = rand([144 73 10 6 23]);
>> tic; b = mean(a,3); toc
Elapsed time is 0.014454 seconds.
R:
> a = array(data = runif(144*73*6*23*10), dim = c(144,73,10,6,23))
> start <- Sys.time (); b = apply(a, c(1,2,4,5), mean); Sys.time () - start
Time difference of 1.229083 mins
I know that apply function is slow because is something like a general purpose function, but I don't know how to deal with this problem because this difference of performances is really a big limit for me. I tried to search for a generalization of colMeans/rowMeans functions but I didn't succeed.
EDIT I'll show a little sample matrix:
> dim(a)
[1] 2 4 3
> dput(aa)
structure(c(7, 8, 5, 8, 10, 11, 9, 9, 6, 12, 9, 10, 12, 10, 14,
12, 7, 9, 8, 10, 10, 9, 8, 6), .Dim = c(2L, 4L, 3L))
a_mean = apply(a, c(2,3), mean)
> a_mean
[,1] [,2] [,3]
[1,] 7.5 9.0 8.0
[2,] 6.5 9.5 9.0
[3,] 10.5 11.0 9.5
[4,] 9.0 13.0 7.0
EDIT (2):
I discovered that applying sum function and then dividing by the size of the removed dimension is definitely faster:
> start <- Sys.time (); aaout = apply(aa, c(1,2,4,5), sum); Sys.time () - start
Time difference of 5.528063 secs