0

I am trying to use summarize a dataset so that it would list out the mean, median, SD, 25th, 5th percentile for all columns with numeric values with NA removed. I have the below so far, but cannot seem to get it into the appropriate structure.

  mtcars %>% summarize(across(where(is.numeric), list(mean = mean, sd = sd), na.rm = TRUE))

I am looking for something like the below:

                          MPG      CYL       DISP      HP      DRAT      etc
  Mean                      #       #          #       #         #
  Median                    #
  90% Percentile(q90)       #
  SD                        #
  5% Percentile (q5)        #
  N                         #

and also like the below

                          Mean      Median       Q90      SD        Q5      N
  MPG                       #           #          #       #         #      #
  CYL                       #
  DISP                      #
  HP                        #
  DRAT                      #
  etc                       #

Is it possible to shape the data this way? Thanks for helping a novice in R.

Phil
  • 7,287
  • 3
  • 36
  • 66
Monklife
  • 177
  • 3
  • 9
  • Does this answer your question? [dplyr summarise\_each with na.rm](https://stackoverflow.com/questions/25759891/dplyr-summarise-each-with-na-rm) – mfg3z0 Feb 24 '23 at 05:26

1 Answers1

1

Here is one of way of doing this :

 mtcars %>% summarize(across(where(is.numeric), list(mean = mean, sd = sd), na.rm = TRUE)) %>%
  pivot_longer(everything(),names_to = "var", values_to = "val" ) %>%
  separate(var, c("var", "stat"), sep = "_") %>%
  pivot_wider(names_from = "stat", values_from = "val")

output:

# A tibble: 11 x 3
   var      mean      sd
   <chr>   <dbl>   <dbl>
 1 mpg    20.1     6.03 
 2 cyl     6.19    1.79 
 3 disp  231.    124.   
 4 hp    147.     68.6  
 5 drat    3.60    0.535
 6 wt      3.22    0.978
 7 qsec   17.8     1.79 
 8 vs      0.438   0.504
 9 am      0.406   0.499
10 gear    3.69    0.738
11 carb    2.81    1.62 

or change names_from = "stat" to names_from = "var" : output

# A tibble: 2 x 12
  stat    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mean  20.1   6.19  231. 147.  3.60  3.22  17.8  0.438 0.406 3.69   2.81
2 sd     6.03  1.79  124.  68.6 0.535 0.978  1.79 0.504 0.499 0.738  1.62
asaei
  • 491
  • 3
  • 5