1

I want to create summary statistics for my dataset. I have tried searching but haven't found anything that matches what I want. I want the columns to be listed on vertically with the statistics measure as headings. Here is how I want it to look:

Column Mean Standard deviation 25th perc. Median 75th perc.
Column 1 Mean column 1 Std column 1 ... ... ...
Column 2 Mean column 2 ... ... ... ...
Etc ... ... ... ... ...

How do I do this? Thankful for any help I can get!:) If there is a specific function to use where I can also do some formatting/styling some info about that would also be appreciated, but the main point is that it should look as described. :)

Julian
  • 53
  • 1
  • 1
  • 5
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Have you tried to write any code at all to do this? Where exactly are you getting stuck? – MrFlick Mar 29 '21 at 19:21

3 Answers3

3

You may want to check out the summarytools package... Has built-in support for both markdown and html.

library(summarytools)  
descr(iris, 
      stats = c("mean", "sd", "q1", "med", "q3"),
      transpose = TRUE)

## Non-numerical variable(s) ignored: Species
## Descriptive Statistics  
## iris  
## N: 150  
##
##                     Mean   Std.Dev     Q1   Median     Q3
## ----------------- ------ --------- ------ -------- ------
##      Petal.Length   3.76      1.77   1.60     4.35   5.10
##       Petal.Width   1.20      0.76   0.30     1.30   1.80
##      Sepal.Length   5.84      0.83   5.10     5.80   6.40
##       Sepal.Width   3.06      0.44   2.80     3.00   3.30
Dominic Comtois
  • 10,230
  • 1
  • 39
  • 61
1

Your question is missing some important features, but I think you want something like this:

Example with just the numerical variables of the iris dataset:

iris_numerical<-iris[,1:4]

calculate statistics

new_df<-sapply(iris_numerical, function(x){c(mean=mean(x), SD=sd(x), Q1=quantile(x, 0.25), median=median(x), Q3=quantile(x, 0.75))})

This gives you summary statistics column-wise

> new_df
       Sepal.Length Sepal.Width Petal.Length Petal.Width
mean      5.8433333   3.0573333     3.758000   1.1993333
SD        0.8280661   0.4358663     1.765298   0.7622377
Q1.25%    5.1000000   2.8000000     1.600000   0.3000000
median    5.8000000   3.0000000     4.350000   1.3000000
Q3.75%    6.4000000   3.3000000     5.100000   1.8000000

Then create final dataframe in the desired format, with colnames as rownames:

new_df<-data.frame(column=colnames(new_df), apply(new_df, 1, function(x) x))
> new_df
                   column     mean        SD Q1.25. median Q3.75.
Sepal.Length Sepal.Length 5.843333 0.8280661    5.1   5.80    6.4
Sepal.Width   Sepal.Width 3.057333 0.4358663    2.8   3.00    3.3
Petal.Length Petal.Length 3.758000 1.7652982    1.6   4.35    5.1
Petal.Width   Petal.Width 1.199333 0.7622377    0.3   1.30    1.8
GuedesBF
  • 8,409
  • 5
  • 19
  • 37
1

We could use descr from collapse

library(collapse)
descr(iris)
akrun
  • 874,273
  • 37
  • 540
  • 662