2

The problem is that function definitions, such as that of "sweep" are often explained in a way that offers little further information. "Sweep" "sweeps out" things. I don't know what that means, and have little chance of finding out what it means by reading the definition.

2 Answers2

3

The sweep() function iterates through a matrix by row (MARGIN = 1) or column (MARGIN = 2) and performs some operation that you want (defined by FUN) taking the input of (STATS). As such, it is useful for performing some operation (FUN) across rows/columns with different inputs.

Using d.b.'s example in the comments, suppose you have a 3x3 matrix full of 0s:

m <- matrix(0, 3, 3)
m

     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0
[3,]    0    0    0

If you want to add 7 to the first column, 3 to the second column, and 11 to the third column (passed to STATS):

sweep(x = m, MARGIN = 2, STATS = c(7, 3, 11), FUN = "+")

     [,1] [,2] [,3]
[1,]    7    3   11
[2,]    7    3   11
[3,]    7    3   11

or you can do it by row (MARGIN = 1):

sweep(x = m, MARGIN = 1, STATS = c(7, 3, 11), FUN = "+")

     [,1] [,2] [,3]
[1,]    7    7    7
[2,]    3    3    3
[3,]   11   11   11

Thus, the sweep() function is most useful when you want to apply a different value to a given function across rows/columns of your matrix. (Note: You can also apply the function across cells with MARGIN = 1:2).

Brigadeiro
  • 2,649
  • 13
  • 30
0

If the operation given in the 4th argument is minus then sweep refers to subtracting the third argument from each row or column of the first argument.

A special case may make it clearer. We will use the built-in 6x2 BOD data frame and use subtraction for the 4th argument.

Then sweep(BOD, 1, v, "-") subtracts the vector v from each column of BOD giving an identical result to BOD - cbind(v, v) (which can be simplified to just BOD - v). Similarly sweep(BOD, 2, u, "-") subtracts the vector u from each row of BOD giving an identical result as BOD - rbind(u, u, u, u, u, u) .

In detail, we provide several equivalences for each of the two cases.

# MARGIN = 1 case.  These each give identical reesults.

v <- rowMeans(BOD) # any vector having length of nrows(BOD) would work
sweep(BOD, 1, v, "-")

BOD - cbind(v, v)

BOD - matrix(v, nrow(BOD), ncol(BOD))

cbind(BOD[1] - v, BOD[2] - v)

BOD - v

In words for the MARGIN=2 case it subtracts the vector u from each row. These each give identical results except as noted.

# MARGIN = 2 case.  These each give identical results except as noted.

u <- colMeans(BOD) # any vector of length ncol(BOD) will work
sweep(BOD, 2, u, "-")

BOD - rbind(u, u, u, u, u, u)

BOD - matrix(u, nrow(BOD), ncol(BOD), byrow = TRUE)

rbind(BOD[1, ] - u, BOD[2, ] - u, BOD[3, ] - u, BOD[4, ] - u, 
  BOD[5, ] - u, BOD[6, ] - u)

mapply("-", BOD, u)  # matrix rather than data.frame

scale(BOD, scale = FALSE)  # matrix rather than data.frame
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341