0

I have checked the group_by function to select mean disp of one cylinder type in mtcars, but it's not giving me the right answer. See below;

summarise(group_by(mtcars,cyl), mean(disp))

Output:

> summarise(group_by(mtcars,cyl), mean(disp))
# A tibble: 3 x 2
    cyl `mean(disp)`
  <dbl>        <dbl>
1     4         105.
2     6         183.
3     8         353.

But

summarise(group_by(mtcars,cyl = 6), mean(disp))

Output:

> summarise(group_by(mtcars,cyl = 6), mean(disp))
# A tibble: 1 x 2
    cyl `mean(disp)`
  <dbl>        <dbl>
1     6         231.

Note: I wanted to get the same answer for cylinder type 6 as per the first code. But the answers are different.

M--
  • 25,431
  • 8
  • 61
  • 93

2 Answers2

3

You can't get the same answer: This is because n and sum is changing whether you group_by cyl or you group_by cyl= 6. Therefore the mean is different: See this example, mainly look and the n and sum as mean is sum/n:

mtcars %>% 
  group_by(cyl = 6) %>% 
  summarise(mean(disp), n=n(), sum_disp=sum(disp), Mean2 = sum_disp/n)

mtcars %>% 
  group_by(cyl) %>% 
  summarise(mean(disp), n=n(), sum_disp=sum(disp), Mean2 = sum_disp/n)

Output:

> mtcars %>% 
+   group_by(cyl = 6) %>% 
+   summarise(mean(disp), n=n(), sum_disp=sum(disp), Mean2 = sum_disp/n)
# A tibble: 1 x 5
    cyl `mean(disp)`     n sum_disp Mean2
  <dbl>        <dbl> <int>    <dbl> <dbl>
1     6         231.    32    7383.  231.
> mtcars %>% 
+   group_by(cyl) %>% 
+   summarise(mean(disp), n=n(), sum_disp=sum(disp), Mean2 = sum_disp/n)
# A tibble: 3 x 5
    cyl `mean(disp)`     n sum_disp Mean2
  <dbl>        <dbl> <int>    <dbl> <dbl>
1     4         105.    11    1156.  105.
2     6         183.     7    1283.  183.
3     8         353.    14    4943.  353.
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • Okay, I understood the difference. Thank you! –  Jul 20 '21 at 12:04
  • Why is `n()` changing? Its because `cyl=6` creates a column called cyl and all the values in it are just 6. So technically doing the sum/mean of all the points in the dataset. – Onyambu Jul 21 '21 at 01:24
2

I think you are looking for comparison operator which is == and not = and it should be used within filter. cyl = 6 would just change all the cyl values to 6 and is returning you mean for all the disp values i.e mean(mtcars$disp).

library(dplyr)

mtcars %>%
  group_by(cyl) %>%
  summarise(mean_disp = mean(disp)) 

#  cyl mean_disp
#1   4  105.1364
#2   6  183.3143
#3   8  353.1000

mtcars %>%
  filter(cyl == 6) %>%
  summarise(mean_disp = mean(disp))

#  mean_disp
#1  183.3143
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Okay, I have got the answer successfully by making the changes as filter(cyl == 6). Thanks for the guidance. –  Apr 04 '22 at 05:12