4

I'm trying to use dplyr to group and summarize a dataframe, but keep getting the following error:

Error: cannot modify grouping variable

Here's the code that generates it:

data_summary <- labeled_dataset %>%
    group_by("Activity") %>%
    summarise_each(funs(mean))

Here's the structure of the data frame that I'm applying this to:

> str(labeled_dataset)
'data.frame':   10299 obs. of  88 variables:
 $ Subject                          : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Activity                         : Factor w/ 6 levels "LAYING","SITTING",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ tBodyAccmeanX                    : num  0.289 0.278 0.28 0.279 0.277 ...
 $ tBodyAccmeanY                    : num  -0.0203 -0.0164 -0.0195 -0.0262 -0.0166 ...
 $ tBodyAccmeanZ                    : num  -0.133 -0.124 -0.113 -0.123 -0.115 ...
 $ tGravityAccmeanX                 : num  0.963 0.967 0.967 0.968 0.968 ...
 $ tGravityAccmeanY                 : num  -0.141 -0.142 -0.142 -0.144 -0.149 ...
 $ tGravityAccmeanZ                 : num  0.1154 0.1094 0.1019 0.0999 0.0945 ...
   ...

The only reference I've found to this error is another post that suggests ungrouping first to make sure the data isn't already grouped. I've tried that without success.

Thanks,

Luke

Luke
  • 73
  • 2
  • 6
  • 2
    Have you tried it without the quotes on `"Activity"`? `dplyr` uses different functions for having quoted arguments or not. – Rich Scriven Dec 21 '14 at 18:33

2 Answers2

6

Don't put the name of the grouping variable in quotes:

data_summary <- labeled_dataset %>%
  group_by(Activity) %>%
  summarise_each(funs(mean))
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
1

Looks like there were two problems:

  1. Grouping variable names were in quotes ("Activity" instead of Activity) - Thanks, Richard!
  2. By not specifying the columns to summarise, dplyr was trying to summarise the mean for each column, including the first two columns that contained the grouped variables.

I fixed the code, specifying all columns except the grouping ones, as follows:

data_summary <- labeled_dataset %>%
    group_by(Activity) %>%
    summarise_each(funs(mean), tBodyAccmeanX:tGravityAccmeanX)
Luke
  • 73
  • 2
  • 6
  • 1
    You say the first two columns were grouping varaibles, but you only group by the second (Activity) column.. If you group by all columns necessary for the grouping, dplyr will only use the rest of the columns in summerise_each and mutate_each. Btw, if you just need to exlcude one columm, like in this case, you can also negate it by using `-Subject` in the summarise_each. – talat Dec 21 '14 at 18:51
  • I actually started off grouping by two variables (activity and subject) but dropped to one when I was troubleshooting. Once I figured out that I needed to exclude the grouping variables, it worked with one or two. Thanks for the tip on using "-Subject". – Luke Dec 22 '14 at 20:18