0

I know there's an easy answer to my question. But I tried everything. What's the problem with the following code? Why is it not grouping_by(name)? .... I tried it with and without the pipe.

df <- data.frame(name= c("jose", "jose", "maria", "maria", "maria", "pedro"),
                 values= c(rep(1,6)), stringsAsFactors = T)

df1 <- mutate(df, N=sum(values))  # summing without grouping

# without pipe
df <- group_by(df, name)           # grouping $name 
df2 <- mutate(df, N= sum(values))  # summing by group

df1 == df2    # there's no difference between results: group_by() is not working

#     name values    N
#[1,] TRUE   TRUE TRUE
#[2,] TRUE   TRUE TRUE
#[3,] TRUE   TRUE TRUE
#[4,] TRUE   TRUE TRUE
#[5,] TRUE   TRUE TRUE
#[6,] TRUE   TRUE TRUE


# with pipe
df3 <- df %>% group_by(name)  %>%  # grouping $name 
          mutate(N= sum(values))   # summing by group

df1 == df3    # there's no difference between results: group_by() is not working

#     name values    N
#[1,] TRUE   TRUE TRUE
#[2,] TRUE   TRUE TRUE
#[3,] TRUE   TRUE TRUE
#[4,] TRUE   TRUE TRUE
#[5,] TRUE   TRUE TRUE
#[6,] TRUE   TRUE TRUE

The output I want is a new column with the sum of the values by group :

df$N <- c(2,2,3,3,3,1) # DESIRED

The output I'm getting is the sum without grouping :

df$N <- c(6,6,6,6,6,6) # NOT DESIRED
  • I actually get `FALSE` for the `N` column, but I did hear conversations about how `group_by` behavior may be tweaked in newer versions, try `ungroup` after the `mutate` statement and see if things change. – Mouad_Seridi May 01 '21 at 09:00
  • 1
    You haven't specified your desired output, but I suspect you want `summarise(N=sum(values), .groups="drop")` rather than `mutate(N=sum(values))`. – Limey May 01 '21 at 09:03
  • The output I want is a new column with the sum of the values by group : df$N <- c(2,2,3,3,3,1) The output I'm getting is the sum without grouping : df$N <- c(6,6,6,6,6,6) – Fidel Alencar May 01 '21 at 09:07
  • 2
    When I run your code I get the desired output i.e `df3$N` is `2 2 3 3 3 1`. You have most probably loaded `plyr` library which is masking the `mutate` function. Try using `dplyr::mutate` i.e `df3 <- df %>% group_by(name) %>% dplyr::mutate(N= sum(values))` – Ronak Shah May 01 '21 at 10:38

1 Answers1

1

Fidel, you have been mostly there ... I put the first mutate in a column named N and the grouped output in a column N2. This way you can see how the group_by() works. Using summarise on a grouped df would return a single row per grouping key(s), i.e. name in your case. Thus, working with mutate is the solution you were looking for.

df %>% 
  mutate(N = sum(values)) %>% 
  group_by(name) %>% 
  mutate(N2 = sum(values)) %>%
  ungroup() # to remove grouping

This yields

  name  values     N    N2
  <fct>  <dbl> <dbl> <dbl>
1 jose       1     6     2
2 jose       1     6     2
3 maria      1     6     3
4 maria      1     6     3
5 maria      1     6     3
6 pedro      1     6     1
Ray
  • 2,008
  • 14
  • 21
  • You don't understand my point. This code doesn't work here. Both column outputs are the same. I need to find this output by other means, since grop_by() + mutate() doesn't work. whats your tidyverse version ? – Fidel Alencar May 01 '21 at 10:38
  • my tidyverse version is 1.3.0. It was mentioned already above that you might have masked functions from loading other packages. Restart RStudio and only load the tidyverse/dplyr and execute the pipe. – Ray May 01 '21 at 10:55