Order of operations in summarise

Question

What is happening in the first line of code and why does the result differ from the two next results?

library(tidyverse)
library(magrittr)

data.frame(A=c(2,2),B=c(1,1)) %>%
   summarise(A = sum(A),B = sum(B), D=sum(A)-sum(B))

yields D=0

data.frame(A=c(2,2),B=c(1,1)) %>%
   summarise(A = sum(A),B = sum(B), D=sum(A-B) )

yields in D=2

data.frame(A=c(2,2),B=c(1,1)) %>% 
  summarise(sum_A = sum(A),sum_B = sum(B), D=sum(A)-sum(B))

yields in D=2.

I can't seem to come up with an explanation to how the D=0 can be a result of such an operation. How can D=0 be a sensible result?

Interesting (+1) also take a look at `data.frame(A=c(2,2),B=c(1,1)) %>% summarise(A = sum(A), B = sum(B), D=sum(A), E = sum(B))` — talat, Nov 30 '17 at 10:47
`mutate` seems to work fine. `data.frame(A=c(2,2),B=c(1,1)) %>% mutate(A = sum(A), B = sum(B), D=sum(A))` — Sotos, Nov 30 '17 at 10:52
From the definition of `summarise`, *summarise() is typically used on grouped data created by group_by(). The output will have one row for each group.* So maybe it happens because the data frame is not grouped? Whereas `mutate` which does not need groups works as expected — Sotos, Nov 30 '17 at 10:54
@Sotos, it should normally work just fine with ungrouped data, too. It looks like a bug to me — talat, Nov 30 '17 at 10:54
From `?summarise` - "*# A summary applied to ungrouped tbl returns a single row*" - it should work as documented behaviour. The issue also persists using a `tibble` instead of a `data.frame` — thelatemail, Nov 30 '17 at 10:55
@docendodiscimus Definitely. I m not saying It should work like that. — Sotos, Nov 30 '17 at 10:57
The smallest example I can do that breaks it - `df %>% summarise(A=sum(A),B=sum(A))` — thelatemail, Nov 30 '17 at 11:00
Looks like a bug. Caused by applying a function to a (single) variable already updated within `summarise`. Interestingly seems to work when you combine variables within a function (?) : `data.frame(A=c(2,2),B=c(1,1)) %>% summarise(A = sum(A), B = sum(B), C = A, D = sum(A), E = mean(A), G = sum(A-B), H = mean(A-B))` — AntoniosK, Nov 30 '17 at 11:02
This is a bug, I have filed an issue at https://github.com/tidyverse/dplyr/issues/3233 — Lionel Henry, Nov 30 '17 at 11:13
I recommend using your last approach of renaming things. Another option is to use summarise (to compute sum of A and B) and then mutate (to compute the difference) — Felipe Gerard, Feb 07 '18 at 18:52
The order is also important... summarise(D=sum(A)-sum(B),A = sum(A),B = sum(B)) yields 2 4 2 — mysteRious, Mar 21 '18 at 03:44

score 1 · Answer 1 · answered Apr 05 '18 at 11:13

1

It is a bug, see https://github.com/tidyverse/dplyr/issues/3233. It is fixed in 0.7.4.9001.

answered Apr 05 '18 at 11:13

trinalbadger587

1,905
1
18
36

Order of operations in summarise

1 Answers1