0

I'm new to R and trying to understand how dplyr works so I can apply it to a dataset that I have. I'm trying to work through this example with the starwars API: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html

I'm trying to group the starwars dataframe by species and sex, and then find the mean of each species and sex. The code is copied from the tutorial:

starwars %>%
  group_by(species, sex) %>%
  select(height, mass) %>%
  summarise(
    height = mean(height, na.rm = TRUE),
    mass = mean(mass, na.rm = TRUE)
  )

And I should be getting this output:

#> Adding missing grouping variables: `species`, `sex`
#> `summarise()` has grouped output by 'species'. You can override using the `.groups` argument.
#> # A tibble: 41 x 4
#> # Groups:   species [38]
#>   species  sex   height  mass
#>   <chr>    <chr>  <dbl> <dbl>
#> 1 Aleena   male      79    15
#> 2 Besalisk male     198   102
#> 3 Cerean   male     198    82
#> 4 Chagrian male     196   NaN
#> # … with 37 more rows

But instead I'm getting this:

#> Adding missing grouping variables: `species`, `sex`
#>    height     mass
#> 1 174.358 97.31186

Could someone help me understand why it's collapsing all species and sex together, and then taking the mean of height and mass, instead of maintaining the separate groups?

Thanks!

int12345
  • 11
  • 2
  • 2
    The posted code works for me. What version of `dplyr` are you using? Does the code work if you remove the `select` line from the pipe chain? – LMc Oct 22 '21 at 21:06
  • 1
    Interesting. I cannot replicate your results. Restart Rstudio and only import "dplyr", maybe you have some conflicts with other packages? – Bloxx Oct 22 '21 at 21:06
  • 2
    Looks like you might be loading `library(plyr)` after `dplyr`, which would supercede ("mask") `dplyr'`s functions `arrange, count, desc, failwith, id, mutate, summarise, summarize`. `plyr` is retired/superceded since 2018 so it probably makes sense to replace any usage with `dplyr`, its successor from the same author. – Jon Spring Oct 22 '21 at 21:12
  • Following @LMc recommendation, function "select" could be in conflict with other packages (e.g., MASS). You can also write your 3rd row as: `dplyr::select(height, mass) %>%` – Bloxx Oct 22 '21 at 21:13

0 Answers0