14

While trying to answer this question, I encountered a difference between mutate and transform in what I expected to be equivalent operations.

# data
x <- data.frame(a=c(rep(0,10),rep(1,10),3),b=c(1:10,0,11:19,0))

#transform
transform(x,a=pmin(a,b), b=pmax(a,b))
   a  b
1  0  1
2  0  2
3  0  3
4  0  4
5  0  5
6  0  6
7  0  7
8  0  8
9  0  9
10 0 10
11 0  1
12 1 11
13 1 12
14 1 13
15 1 14
16 1 15
17 1 16
18 1 17
19 1 18
20 1 19
21 0  3

#mutate
libarary(dplyr)
x %>% mutate(a=pmin(a,b), b=pmax(a,b))
   a  b
1  0  1
2  0  2
3  0  3
4  0  4
5  0  5
6  0  6
7  0  7
8  0  8
9  0  9
10 0 10
11 0  0
12 1 11
13 1 12
14 1 13
15 1 14
16 1 15
17 1 16
18 1 17
19 1 18
20 1 19
21 0  0

Note the differences in lines 11 and 21. I suspect that mutate is mutating the data as it goes and therefore, pmax is not seeing the original data. Is this correct? Is it a bug, or by design?

Community
  • 1
  • 1
James
  • 65,548
  • 14
  • 155
  • 193
  • you're correct, and it's by design (there's some discussion in the archives, I forget where). – baptiste Jul 14 '14 at 18:59
  • 1
    @baptiste Thanks, I think I understand why now: to allow computed variables to be used in the same command, so the originals need to be referenced explicitly. – James Jul 14 '14 at 19:09

1 Answers1

8

It appears my suspicions are correct, and that it is by design to allow the use of computed variables immediately afterwards, eg:

data.frame(a=1:4,b=5:8) %>% mutate(sum=a+b, letter=letters[sum])
  a b sum letter
1 1 5   6      f
2 2 6   8      h
3 3 7  10      j
4 4 8  12      l

In order to replicate the expected behaviour from transform one needs to simply reference the variable directly:

x %>% mutate(a=pmin(x$a,x$b), b=pmax(x$a,x$b))
   a  b
1  0  1
2  0  2
3  0  3
4  0  4
5  0  5
6  0  6
7  0  7
8  0  8
9  0  9
10 0 10
11 0  1
12 1 11
13 1 12
14 1 13
15 1 14
16 1 15
17 1 16
18 1 17
19 1 18
20 1 19
21 0  3
James
  • 65,548
  • 14
  • 155
  • 193
  • 2
    Not explicitly about `pmin` and `pmax`, but you may read about this behaviour in the [dplyr vignette](http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html): `dplyr::mutate()` works the same way as `plyr::mutate()` and similarly to `base::transform()`. The key difference between `mutate()` and `transform()` is that mutate allows you to refer to columns that you just created [the 'a' column in your example]" – Henrik Jul 15 '14 at 08:18
  • Could you please explain the use of %>% ? – Anusha Oct 25 '14 at 05:13
  • Is there a typo: "In order to replicate the expected behaviour from transform one needs to ". You are referring to mutate, right ? Thanks for alerting to this difference. – Anusha Oct 25 '14 at 05:21
  • @Anusha `%>%` is an implementation of piping using the `magrittr` package. I don't think there is a typo, I am replicating the behaviour of transform in mutate. – James Oct 25 '14 at 07:58
  • 1
    To replicate the expected output from transform you could actually use the x$ notation only for the second x$a, I think it makes it clearer. – moodymudskipper Jun 10 '17 at 10:02