dplyr: group_by + summarize not working as expected

Question

I'm having some trouble using R's group_by and summarize functions and was wondering if you all could lend me some help. I have a table similar to this:

Category     Frequency
    First        1
    First        4
    Second       6
    First        1
    Third        1
    Third        2
    Second       6
    First        2
    Second       1

I'm attempting to use dplyr's group_by and summarize to find the mean of the frequency column. Here's my sample code:

    table %>%
         group_by(table$Category) %>%
         summarize(meanfrequency = mean(table$frequency))

What I would expect would be for a table to be spit out that breaks down the mean frequency grouped by individual category, like so:

Category     Frequency
    First        2
    Second       4.33
    Third        1.5

However, what I'm receiving is a table grouped by category, with each category receiving the value of the mean of the ENTIRE table, like so:

   Category     Frequency
    First        2.66
    Second       2.66
    Third        2.66

Any clue to what's going on here? I should say I'm a beginner so perhaps I'm missing something obvious. I should note that in my actual table there's several variables in the table other than the 2 I'm attempting to analyze, but not sure if that's relevant or might be messing with something. I also loaded this data into R using Rstudio's built in readxcl package.

Thanks in advance!

What's "readxcl"? `readxl`? I don't think it's built in to RStudio. `dplyr` functions generally don't want the `table$` part, just the bare column name — camille, Jan 29 '20 at 01:03

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

We are extracting the whole column with $ instead we can just use the unquoted column name to get only the values of the 'frequency' with in each 'Category'

library(dplyr)
table %>%
     group_by(Category) %>%
     summarize(meanfrequency = mean(Frequency))
# A tibble: 3 x 2
#  Category meanfrequency
#  <chr>            <dbl>
#1 First             2   
#2 Second            4.33
#3 Third             1.5

If we do table$Frequency inside the chain, it is similar to that we do outside. Also, R is case-sensitive, so need table$Frequency instead of table$frequency

mean(table$Frequency)

Also, table is a function/class name, so it is better not to name objects with those names

data

table <- structure(list(Category = c("First", "First", "Second", "First", 
"Third", "Third", "Second", "First", "Second"), Frequency = c(1L, 
4L, 6L, 1L, 1L, 2L, 6L, 2L, 1L)), class = "data.frame", row.names = c(NA, 
-9L))

Ahh just gave this a go and it appears to have worked! Thanks for your help here, friend. — Aidang, Jan 29 '20 at 01:06
Little late but just clicked the tick mark! Thanks again for the quick and kind answer. — Aidang, Feb 03 '20 at 18:09

dplyr: group_by + summarize not working as expected

1 Answers1

data

Linked