I want to find out deciles for each grouped variable. I am specifically looking for methods using dplyr
and lapply
. I'd appreciate if you can help me out.
Here's my what I tried. I don't know how to pull deciles directly other than calling dplyr::ntile()
(which didn't work for me)
Attempt 1
Here's what I tried using describe()
from Hmisc
package:
set.seed(10)
IData <- data.frame(let = sample( x = LETTERS, size = 10000, replace=TRUE), numbers = sample(x = c(1:20000),size = 10000))
Output<-IData %>% data.table::as.data.table(.) %>% split(.,by=c("let"),drop = TRUE,sorted = TRUE) %>% purrr::map(~describe(.$numbers))
This certainly helps but there are two problems with above code:
a) The output (even the list format) is not something I am looking for.
b) I don't really know how to extract 5%, 10%...from the list above.
The bottomline is that I am stuck
Attempt 2
I tried replacing describe
by ntile
, but the following code gave me an output which didn't make sense to me because the number of columns aren't 10. Upon running Output[[1]]
, I see a vector of ~400 numbers instead of 10.
Output<-IData %>% data.table::as.data.table(.) %>% split(.,by=c("let"),drop = TRUE,sorted = TRUE) %>% purrr::map(~dplyr::ntile(.$numbers,10))
Attempt 3 = Expected Output
Finally, I tried going the old school (i.e. copy-paste) to get the expected output:
Output<-IData %>%
dplyr::group_by(let) %>%
dplyr::summarise( QQuantile1 = quantile(`numbers`, c(.10)),
QQuantile1 = quantile(`numbers`, c(.10)),
QQuantile2 = quantile(`numbers`, c(.20)),
QQuantile3 = quantile(`numbers`, c(.30)),
QQuantile4 = quantile(`numbers`, c(.40)),
QQuantile5 = quantile(`numbers`, c(.50)),
QQuantile6 = quantile(`numbers`, c(.60)),
QQuantile7 = quantile(`numbers`, c(.70)),
QQuantile8 = quantile(`numbers`, c(.80)),
QQuantile9 = quantile(`numbers`, c(.90)),
QQuantile10 = quantile(`numbers`, c(.100)))
Question: Can someone please help me to generate above output by using these three (not one, but preferably all the methods for learning)
1) lapply
2) dplyr
3) data.table
I looked at several threads on SO, but they all talk about a specific quantile and not all of them. E.g. Find top deciles from dataframe by group thread.