I am a beginner in R, and have transitioned from Stata/SPSS to R. I used to run tabular command in Stata to generate summary of continuous variable by grouping variable. Is there any way I can do this?
I searched on SO, and I found this thread: How to get Summary statistics by group
While Hadley's map function did help me provide quartiles, mean and median, but I need more. Specifically, the number of elements in a particular quartile, the number of elements in a particular level of a factor.
Here's dummy code:
data <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66,
71, 67, 68, 68, 56, 62, 60, 61, 63, 64, 63, 59)
grp <- factor(rep(LETTERS[1:4], c(4,6,6,8)))
df <- data.frame(group=grp, dt=data)
df %>%
data.table::as.data.table(.) %>%
split(.,by=c("group"),drop = TRUE,sorted = TRUE) %>%
purrr::map(~summary(.$dt))
And
describe(df$group)
gives two different disjointed sets--one only provides descriptive statistics about categorical variable, while the other only provides basic six functions. I need to see what's going on within a quartile.
I am using Hmisc::describe
package above.
How can I do this using R? I'd sincerely appreciate any help.
Sample Output:
My sample output would look something like this , but it would be grouped for each of the four levels of categorical variable. This way, I can analyze what's going on with continuous variable for each level of categorical variable. Right now, the output is spread across three different commands, and it harder for me to understand what's happening.
Here are the commands:
df %>% data.table::as.data.table(.) %>% split(.,by=c("group"),drop = TRUE,sorted = TRUE) %>% purrr::map(~summary(.$dt))
df %>% data.table::as.data.table(.) %>% split(.,by=c("group"),drop = TRUE,sorted = TRUE) %>% purrr::map(~describe(.$dt))
df %>% group_by(group) %>% count(quartile = ntile(dt, 4))
[The credit for the third command goes to one of the people who answered this questions.]