1

I imagine this question is duplicated, but I could not find any working answer to use dplyr in an easy and elegant way to add subgroup counts after a group_by. If this question is duplicate, please delete. If you want a code to reproduce, I´ll do that. Please, don´t click on "negative".

I´ve tried to use spread, but it was not useful, after, I´ve tried to follow the instructions here, once it helps to Count unique by group in a data frame, but it does not work. The same solution is here, but the output is strange.

What I have: enter image description here

What I really want (using simple code... I imagine dplyr can handle that without having to use gather()), is to insert three new columns, for each factor level.

enter image description here

My code:

descritivos %>% 
  group_by(sexo) %>% 
  summarise(n=n(),Idade_media = mean(idade, na.rm=T),
  idade_sd=sd(idade, na.rm=T),
              qtde_sexo = n(),
  Proporção_sexo = n()/nrow(.),
  Pontuação_media=mean(total),
  pontuacao_sd=sd(total), n_unique = n_distinct(Escolaridade))

With this code, I was almost there, but it duplicates some output.

descritivos %>% 
  group_by(sexo, Escolaridade) %>% 
  summarise(n=n(),Idade_media = mean(idade, na.rm=T),
  idade_sd=sd(idade, na.rm=T),
              qtde_sexo = n(),
  Proporção_sexo = n()/nrow(.),
  Pontuação_media=mean(total),
  pontuacao_sd=sd(total), n_unique = n_distinct(Escolaridade)) %>% spread(Escolaridade, n)
  spread(count(Escolaridade), n, fill=0) 

enter image description here

This is a reproducible code to work with:

library(tidyverse)
ds <- data.frame(sex=c(0,1), schooling=c("k12","high","college","university"), age=rnorm(mean=20,sd=2, n=40))

ds %>% group_by(sex, schooling) %>% 
  summarise(mean(age), n=n()) %>% spread(schooling, n)


ds %>% group_by(sex, schooling) %>% 
  summarise(n()) %>% t()

The desired output: enter image description here Thanks much

Last edit:

Thanks to @Akrun, I solved my question. If you have the same, please follow this code:

descritivos %>% 
  group_by(sexo) %>%
  group_by(Escolaridade, 
  Idade_media = mean(idade, na.rm=T),
  idade_sd=sd(idade, na.rm=T),
              qtde_sexo = n(),
  Proporção_sexo = n()/nrow(.),
  Pontuação_media=mean(total),
  pontuacao_sd=sd(total), add=TRUE) %>%
  summarise(n=n()) %>% 
  spread(Escolaridade, n)

or this code to the reproducible code:

ds %>% group_by(sex) %>% 
  group_by(schooling = paste0("school", schooling), Mean = mean(age), 
           ndist = n_distinct(schooling), add = TRUE) %>% summarise(n = n()) %>% 
  spread(schooling, n)
Michael
  • 41,989
  • 11
  • 82
  • 128
Luis
  • 1,388
  • 10
  • 30
  • I don't understand the question. What is your desired output and why do you need the `spread` at the end of your code chunk? – Weihuang Wong May 28 '18 at 14:45
  • Please use `dput` to show a small reproducible example instead of images – akrun May 28 '18 at 14:45
  • Hello, @akrun, I edited my question. Thanks for all support – Luis May 28 '18 at 15:07
  • Based on the edited (with reproducible example), you showed two pipe chunks with group by. Are you looking to combine those two? Where is the output of `n_distinct(schooling)` in the combined – akrun May 28 '18 at 15:13
  • Hello again, @akrun .Yes, I would like to have just one table with group_by and something like count(schooling). n_distinct(schooling) is not working to count the observations in all subgroup. – Luis May 28 '18 at 15:17
  • It is not clear about the expected output. For the 0 sex row, you showed schooling values and for sex 1, it is number of rows. Those seems like two different features – akrun May 28 '18 at 15:21
  • Perhaps `ds %>% group_by(sex) %>% group_by(schooling = paste0("school", schooling), Mean = mean(age), ndist = n_distinct(schooling), add = TRUE) %>% summarise(n = n()) %>% spread(schooling, n)` this helps – akrun May 28 '18 at 15:23
  • 1
    Hey, @akrun, I have edit the original post to clarify my question, but the last answer you provide works !! Thanks thanks much! I spent more than 3 hours searching for that!! Thanks again! – Luis May 28 '18 at 15:35

1 Answers1

1

We can do this in a single chain

ds %>% 
  group_by(sex) %>%
  group_by(schooling = paste0("school", schooling), Mean = mean(age), 
                ndist = n_distinct(schooling), add = TRUE) %>% 
  summarise(n = n()) %>% 
  spread(schooling, n)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thanks! It´s good to know I can have two "group_by" in a single chain!! – Luis May 28 '18 at 15:38
  • Just for the clarity, is it possible to use n() inside group_by sex to report as well this output ? [sorry, this output is already reported] – Luis May 28 '18 at 15:56
  • @Luis Yes, sure, in that case `ds %>% group_by(sex) %>% group_by(n = n(), add = TRUE) %>% group_by(schooling = paste0("school", schooling), Mean = mean(age), ndist = n_distinct(schooling), add = TRUE) %>%` Then use the `summarise` with new `n()` ?\ – akrun May 28 '18 at 16:06
  • since you know how to deal with dplyr, I´m trying to add a new row with means, like that: bind_rows(summarise_all(., funs(if(is.numeric(.)) mean(.) else "Total"))), but it´s not working. Could you please help me with that ? Thanks much! – Luis May 29 '18 at 15:47
  • @Luis with the example you showed, the below code works `ds %>% summarise_all(funs(if(is.numeric(.)) mean(.) else "Total")) %>% bind_rows(ds, .)` There will be some friendly warnings about coercing the levels as some columns are `factors` . – akrun May 29 '18 at 16:05