I'd like to describe a response variable according to all the values in a factor variable.
I want to run something like this code
library("Hmisc")
describe(mtcars$hp)
Except that I want to get a different output by each value of cyl
are you looking for
lapply(split(mtcars,mtcars$cyl),describe)
Edit: I see you specificallly were looking for the describe
on on hp
. You can add $hp
to the above split, or more simply use
tapply(mtcars$hp,mtcars$cyl,describe)
a tidy
/ purrr
solution
library(Hmisc)
library(purrr)
mtcars %>%
split(.$cyl) %>%
purrr::map(~ describe(.x$hp))
#> $`4`
#> .x$hp
#> n missing distinct Info Mean Gmd .05 .10
#> 11 0 10 0.995 82.64 24.51 57.0 62.0
#> .25 .50 .75 .90 .95
#> 65.5 91.0 96.0 109.0 111.0
#>
#> lowest : 52 62 65 66 91, highest: 93 95 97 109 113
#>
#> Value 52 62 65 66 91 93 95 97 109 113
#> Frequency 1 1 1 2 1 1 1 1 1 1
#> Proportion 0.091 0.091 0.091 0.182 0.091 0.091 0.091 0.091 0.091 0.091
#>
#> $`6`
#> .x$hp
#> n missing distinct Info Mean Gmd
#> 7 0 4 0.911 122.3 23.71
#>
#> Value 105 110 123 175
#> Frequency 1 3 2 1
#> Proportion 0.143 0.429 0.286 0.143
#>
#> $`8`
#> .x$hp
#> n missing distinct Info Mean Gmd
#> 14 0 9 0.985 209.2 56.69
#>
#> lowest : 150 175 180 205 215, highest: 215 230 245 264 335
#>
#> Value 150 175 180 205 215 230 245 264 335
#> Frequency 2 2 3 1 1 1 2 1 1
#> Proportion 0.143 0.143 0.214 0.071 0.071 0.071 0.143 0.071 0.071
You can group_by
cyl
and store describe
object in a list :
library(dplyr)
library(Hmisc)
new_mtcars <- mtcars %>% group_by(cyl) %>% summarise(data = list(describe(hp)))
new_mtcars
# A tibble: 3 x 2
# cyl data
# <dbl> <list>
#1 4 <describe>
#2 6 <describe>
#3 8 <describe>
new_mtcars$data[[1]]
#hp
# n missing distinct Info Mean Gmd .05 .10
# 11 0 10 0.995 82.64 24.51 57.0 62.0
# .25 .50 .75 .90 .95
# 65.5 91.0 96.0 109.0 111.0
#lowest : 52 62 65 66 91, highest: 93 95 97 109 113
#Value 52 62 65 66 91 93 95 97 109 113
#Frequency 1 1 1 2 1 1 1 1 1 1
#Proportion 0.091 0.091 0.091 0.182 0.091 0.091 0.091 0.091 0.091 0.091