0

How do I calculate the summary statistics (mean, min/max, # of obs) for a continuous variable over the levels of a factor (categorical) variable?

For example, if GPA is the continuous variable and grade is the categorical variable taking levels 9th, 10th, 11th, and 12th, is there a command you would recommend?

zephryl
  • 14,633
  • 3
  • 11
  • 30
sili
  • 9
  • 2
  • 1
    Using dplyr, you would do `my_data %>% group_by(grade) %>% summarize(across(GPA, list(mean = mean, min = min, max = max), n = n())`. – zephryl Nov 21 '22 at 02:24
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Nov 21 '22 at 03:20
  • [Here](https://www.ruampimentel.me/post/descriptives_r/) are my favorite functions to run descriptives. Enjoy. – Ruam Pimentel Nov 21 '22 at 04:05

1 Answers1

0

You can manually generate your descriptives using dplyr or you can use describeBy or tapply to automatically print a table.

library(tidyverse)
library(psych)
data(iris)

## dplyr from zephryl's comment
iris %>% 
  group_by(Species) %>% 
  summarize(across(Sepal.Length, list(mean = mean, min = min, max = max), n = n()))
#> # A tibble: 3 × 4
#>   Species    Sepal.Length_mean Sepal.Length_min Sepal.Length_max
#>   <fct>                  <dbl>            <dbl>            <dbl>
#> 1 setosa                  5.01              4.3               50
#> 2 versicolor              5.94              4.9               50
#> 3 virginica               6.59              4.9               50

## tapply
tapply(iris$Sepal.Length, iris$Species, summary)
#> $setosa
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   4.300   4.800   5.000   5.006   5.200   5.800 
#> 
#> $versicolor
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   4.900   5.600   5.900   5.936   6.300   7.000 
#> 
#> $virginica
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   4.900   6.225   6.500   6.588   6.900   7.900

## psych
describeBy(iris$Sepal.Length, iris$Species)
#> 
#>  Descriptive statistics by group 
#> group: setosa
#>    vars  n mean   sd median trimmed mad min max range skew kurtosis   se
#> X1    1 50 5.01 0.35      5       5 0.3 4.3 5.8   1.5 0.11    -0.45 0.05
#> ------------------------------------------------------------ 
#> group: versicolor
#>    vars  n mean   sd median trimmed  mad min max range skew kurtosis   se
#> X1    1 50 5.94 0.52    5.9    5.94 0.52 4.9   7   2.1  0.1    -0.69 0.07
#> ------------------------------------------------------------ 
#> group: virginica
#>    vars  n mean   sd median trimmed  mad min max range skew kurtosis   se
#> X1    1 50 6.59 0.64    6.5    6.57 0.59 4.9 7.9     3 0.11     -0.2 0.09
jrcalabrese
  • 2,184
  • 3
  • 10
  • 30