1

I am trying to understand the way sweep function works with multidimensional arrays(4d, 5d ...) specifically when margin equals tuple of dims c(1,2), c(1,3) ...

for example:

x<-array(1,dim = c(2,3,4,5))
sweep(x, STATS=_, MARGIN= c(1,2), FUN='*')

What should the dimensions of STATS here? and how it works?

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Eman Ismail
  • 133
  • 7

1 Answers1

1

The dimensions of the statistics given in STATS should be the same as the dimensions resulting from MARGINalizing the data in the input array or, although not recommended, a size that is a sub-multiple of the number of elements in that result (e.g. length 2 in a 2x3 array; or a 2x4 array in a 2x4x3 array; or 2x2 array in a 2x4x3 array, etc.).

In order to understand the dimensions resulting from MARGINalizing the data, let's look at an example:

# Example data in a 3D array of size 2x3x4
set.seed(1717)
x = array(runif(2*3*4), c(2,3,4))

# We MARGINalize the data by computing the mean on all dimensions *other than*
# the stated ones: (1, 3)
# This gives a 2D result whose dimension is of size
# "length of dim 1" x "length of dim 3", i.e. 2x4
marginalize_on_dims = c(1,3)
m = apply(x, marginalize_on_dims, mean)

which results in the following 2x4 "means" array:

> m
          [,1]      [,2]     [,3]      [,4]
[1,] 0.3662613 0.2971481 0.155660 0.5121214
[2,] 0.5808111 0.7322553 0.662044 0.4984720

We now sweep out the computed means m from the original x array:

x_swept_out_of_means_m = sweep(x, STATS=m, MARGIN=marginalize_on_dims)

which results in:

> x_swept_out_of_means_m
, , 1

           [,1]       [,2]      [,3]
[1,] -0.2934119 -0.3224825 0.6158943
[2,] -0.4540748  0.1814070 0.2726678

, , 2

           [,1]      [,2]        [,3]
[1,] -0.1452443 0.3631910 -0.21794673
[2,] -0.1205201 0.0873856  0.03313448

, , 3

              [,1]        [,2]        [,3]
[1,] -0.0766162667 -0.14700413  0.22362039
[2,]  0.0006661599  0.05828265 -0.05894881

, , 4

           [,1]       [,2]       [,3]
[1,]  0.2341822 -0.4071083  0.1729261
[2,] -0.2680816  0.4772658 -0.2091843

We now note that the summary on the swept-out result shows a mean of 0 which is consistent to having substracted the mean:

> summary(x_swept_out_of_means_m)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-0.45407 -0.21137 -0.02914  0.00000  0.19196  0.61589 

Therefore in your example, since you are marginalizing on dimensions 1 and 2, you should use a STATS value that is of dimension 2x3, for instance:

x <- array(1, dim=c(2,3,4,5))
sweep(x, STATS=matrix(nrow=c(2,3), data=c(2,3,-2,4,0,-3)), MARGIN=c(1,2), FUN='*')

where the result should be a 2x3x4x5 array with the following 2x3 array repeated 4x5 times:

         [,1] [,2] [,3]
[1,]    2   -2    0
[2,]    3    4   -3

Session Info:

> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
mastropi
  • 1,354
  • 1
  • 10
  • 14
  • Thanks, Do you think there is an equivalent for sweep in python or even most close to it? – Eman Ismail May 13 '19 at 10:38
  • There should be... using Python's broadcasting. Check out this question: https://stackoverflow.com/questions/23117756/python-numpy-or-pandas-equivalent-of-the-r-function-sweep – mastropi May 13 '19 at 12:25