1

I am trying to add a confidence interval to the output produced by skimr

library(skimr); library(Rmisc)

skim_with(numeric = list(CI = Rmisc::CI), append = FALSE)

skim(mtcars)

Skim summary statistics
 n obs: 32 
 n variables: 11 

── Variable type:numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────
 variable                    CI
       am   upp: 0.59, mea: 0  
     carb   upp: 3.39, mea: 2  
      cyl   upp: 6.83, mea: 6  
     disp upp: 275.41, mea: 230
     drat   upp: 3.79, mea: 3  
     gear   upp: 3.95, mea: 3  
       hp upp: 171.41, mea: 146
      mpg  upp: 22.26, mea: 20 
     qsec  upp: 18.49, mea: 17 
       vs   upp: 0.62, mea: 0  
       wt   upp: 3.57, mea: 3  

This hasn't quite worked as the lower bound of the confidence interval is missing. How can I get the lower and upper bounds of the confidence interval to work with skimr?

Elin
  • 6,507
  • 3
  • 25
  • 47
luciano
  • 13,158
  • 36
  • 90
  • 130

2 Answers2

2

Only way I can think of is brute force, repeating the CI calculation:

skim_with(numeric = list(mean = mean,
                         lwr=~Rmisc::CI(.)["lower"],
                         upr=~Rmisc::CI(.)["upper"]),
         append=FALSE)

There may be some way to do it with quosures etc. etc. but I don't want to risk having my brain explode by thinking about it on a Friday afternoon.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 1
    Time for a beer, Ben? – IRTFM Jun 15 '18 at 21:49
  • Yes this will work also, and when I was working on the quantiles I actually found it was faster to do them separately than to get the vector and then have skimr figure out how to render it. – Elin Jul 14 '18 at 17:08
2

There are actually a lot more values in that skim-result than are displayed by its print.skim_df output. Look at dput(skim(mtcars)).

class(skim(mtcars))
[1] "skim_df"    "tbl_df"     "tbl"        "data.frame"
print.data.frame(skim(mtcars))
# gives a long result

   variable    type stat level       value   formatted
1       mpg numeric   CI upper  22.2635715  upp: 22.26
2       mpg numeric   CI  mean  20.0906250  mea: 20.09
3       mpg numeric   CI lower  17.9176785  low: 17.92
4       cyl numeric   CI upper   6.8313934   upp: 6.83
5       cyl numeric   CI  mean   6.1875000   mea: 6.19
6       cyl numeric   CI lower   5.5436066   low: 5.54
7      disp numeric   CI upper 275.4065392 upp: 275.41
#   snipped the rest.....

I basically removed the skim_df-classes and worked with the data.frame version using reshape2::dcast.

reshape2::dcast( as.data.frame(skim(mtcars))[c('variable','level','value')],
                 variable~level ,value.var='value')
   variable       lower       mean       upper
1        am   0.2263446   0.406250   0.5861554
2      carb   2.2301583   2.812500   3.3948417
3       cyl   5.5436066   6.187500   6.8313934
4      disp 186.0372108 230.721875 275.4065392
5      drat   3.4037903   3.596563   3.7893347
6      gear   3.4214933   3.687500   3.9535067
7        hp 121.9679499 146.687500 171.4070501
8       mpg  17.9176785  20.090625  22.2635715
9      qsec  17.2044883  17.848750  18.4930117
10       vs   0.2557828   0.437500   0.6192172
11       wt   2.8644785   3.217250   3.5700215

Then I tested to see if the as.data.frame was actually needed and it was not, so this is more compact:

reshape2::dcast( skim(mtcars),
                 variable~level , value.var='value')
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Somehow 'level' didn't work for me. Instead, I used 'stat', and it worked well. Like, reshape2::dcast( as.data.frame(skim(mtcars))[c('variable','stat','value')], variable~stat ,value.var='value') – stok Aug 09 '18 at 06:30