1

I'd like to describe a response variable according to all the values in a factor variable.

I want to run something like this code

library("Hmisc")
describe(mtcars$hp)

Except that I want to get a different output by each value of cyl

Cauder
  • 2,157
  • 4
  • 30
  • 69

3 Answers3

3

are you looking for

lapply(split(mtcars,mtcars$cyl),describe)

Edit: I see you specificallly were looking for the describe on on hp. You can add $hp to the above split, or more simply use

tapply(mtcars$hp,mtcars$cyl,describe)
Daniel O
  • 4,258
  • 6
  • 20
  • Beat me by seconds. Except it seems in this example like that first `mtcars` should be `mtcars$hp` –  Jul 15 '20 at 12:20
  • Good attention to detail @Adam, I've updated the answer – Daniel O Jul 15 '20 at 12:23
  • Can I attach this to a dplyr statement? I'd like to do something like mtcars %>% filter(hp > x) %>% {tapply function} – Cauder Jul 15 '20 at 12:28
  • You can do something like this @Cauder `mtcars %>% filter(hp > 100) %>% split(.$cyl) %>% purrr::map(~ describe(.x$hp))` – Chuck P Jul 15 '20 at 12:41
3

a tidy / purrr solution

library(Hmisc)
library(purrr)
mtcars %>%
  split(.$cyl) %>%
  purrr::map(~ describe(.x$hp))
#> $`4`
#> .x$hp 
#>        n  missing distinct     Info     Mean      Gmd      .05      .10 
#>       11        0       10    0.995    82.64    24.51     57.0     62.0 
#>      .25      .50      .75      .90      .95 
#>     65.5     91.0     96.0    109.0    111.0 
#> 
#> lowest :  52  62  65  66  91, highest:  93  95  97 109 113
#>                                                                       
#> Value         52    62    65    66    91    93    95    97   109   113
#> Frequency      1     1     1     2     1     1     1     1     1     1
#> Proportion 0.091 0.091 0.091 0.182 0.091 0.091 0.091 0.091 0.091 0.091
#> 
#> $`6`
#> .x$hp 
#>        n  missing distinct     Info     Mean      Gmd 
#>        7        0        4    0.911    122.3    23.71 
#>                                   
#> Value        105   110   123   175
#> Frequency      1     3     2     1
#> Proportion 0.143 0.429 0.286 0.143
#> 
#> $`8`
#> .x$hp 
#>        n  missing distinct     Info     Mean      Gmd 
#>       14        0        9    0.985    209.2    56.69 
#> 
#> lowest : 150 175 180 205 215, highest: 215 230 245 264 335
#>                                                                 
#> Value        150   175   180   205   215   230   245   264   335
#> Frequency      2     2     3     1     1     1     2     1     1
#> Proportion 0.143 0.143 0.214 0.071 0.071 0.071 0.143 0.071 0.071
Chuck P
  • 3,862
  • 3
  • 9
  • 20
  • If I want to split on two variables, can I do `split(.$cyl, .$vs)` – Cauder Jul 15 '20 at 12:51
  • In that case I would use the "new" `dplyr` `group_split` like this `mtcars %>% group_by(cyl, am) %>% group_split() %>% purrr::map(~ describe(.x$hp))` – Chuck P Jul 15 '20 at 14:01
1

You can group_by cyl and store describe object in a list :

library(dplyr)
library(Hmisc)

new_mtcars <- mtcars %>% group_by(cyl) %>% summarise(data = list(describe(hp)))
new_mtcars
# A tibble: 3 x 2
#    cyl data      
#  <dbl> <list>    
#1     4 <describe>
#2     6 <describe>
#3     8 <describe>

new_mtcars$data[[1]]
#hp 
#       n  missing distinct     Info     Mean      Gmd      .05      .10 
#      11        0       10    0.995    82.64    24.51     57.0     62.0 
#     .25      .50      .75      .90      .95 
#    65.5     91.0     96.0    109.0    111.0 

#lowest :  52  62  65  66  91, highest:  93  95  97 109 113
                                                                  
#Value         52    62    65    66    91    93    95    97   109   113
#Frequency      1     1     1     2     1     1     1     1     1     1
#Proportion 0.091 0.091 0.091 0.182 0.091 0.091 0.091 0.091 0.091 0.091
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213