2

I'm brushing up on my R skills and finally feel like I've mastered the strange sweep function e.g.

df <- data.frame(a = 1:3, b = 2:4)
sweep(df, MARGIN = 2, STATS = c(5, 10), FUN = "*")

##    a  b
## 1  5 20
## 2 10 30
## 3 15 40

and more usefully here, on a tutorial I'm working on implementing a spatial interaction model in R.

They say that a sign you understand something is that you can say it in many ways, and I think this applies more in programming than almost anywhere else. Yet, despite the problem that sweep solves seeming apply-esque, I have NO IDEA whether they are to some degree interchangeable.

So, in order to improve my own understanding of R, is there any way to do the above procedure using apply?

Community
  • 1
  • 1
RobinLovelace
  • 4,799
  • 6
  • 29
  • 40

2 Answers2

7

This is close:

t(apply(df, 1, `*`, c(5,10)))

The row names are lost but otherwise the output is the same

> t(apply(df, 1, '*', c(5,10)))
      a  b
[1,]  5 20
[2,] 10 30
[3,] 15 40

To break this down, say we were doing this by hand for the first row of df, we'd write

> df[1, ] * c(5, 10)
  a  b
1 5 20

which is the same as calling the '*'() function with arguments df[1, ] and c(5, 10)

> '*'(df[1, ], c(5, 10))
  a  b
1 5 20

From this, we have enough to set up an apply() call:

  1. we work by rows, hence MARGIN = 1,
  2. we apply the function '*'() so FUN = '*'
  3. we need to supply the second argument, c(5,10), to '*'(), which we do via the ... argument of apply().

The only extra thing to realise is how apply() sticks together the vector resulting from each "iteration"; here they are bound column-wise and hence we need to transpose the result from apply() so that we get the same output as sweep().

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Fantastic answer Gavin thank you. Only leaves the question: why is there a separate sweep function if you can do it all with apply? – RobinLovelace Jan 02 '14 at 09:10
  • 2
    @RobinLovelace Efficiency; try timing the two approaches. `sweep` applies `FUN` to all elements in one single call. `apply` sets up a `for` loop over the (in this case) rows and calls `FUN` `nrow` times. But `apply()` can do a lot more than `sweep()` can, so that generality comes with a price. – Gavin Simpson Jan 02 '14 at 15:39
0

As an additional information, since questions about sweep are recurring, quick benchmarking gives (on Intel i7-8700 with Windows)

x <- matrix(data = 20000*5000, nrow = 20000, ncol = 5000)
system.time(expr = {
  aa <- colMeans(x = x)
  bb <- sweep(x = x, MARGIN = 2, STATS = aa, FUN = "-")
})
#        user      system      elapsed
#        4.69        0.16        4.84 
system.time(expr = {
  bbb <- apply(X = x, MARGIN = 1, FUN = function(z) z - mean(x = z))
  bbb <- t(x = bbb)
})
#        user      system      elapsed
#        6.28        0.55        6.85 

Meaning that sweep is more efficient when applicable.

Comevussor
  • 173
  • 1
  • 8