4

I'm using skimr, and I added two summary functions (iqr_na_rm and median_na_rm) to the list of summary functions for the function skim. However, by default these new summary functions (called skimmers in skimr documentation) appear at the end of the table. Instead, I'd like median and iqr to appear after mean and sd.

The final goal is to show the results in a .Rmd report like this:

---
title: "Test"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(warning = FALSE, 
                      message = FALSE,
                      echo    = FALSE)
```

## Test

```{r test, results = 'asis'}
library(skimr)
library(dplyr)
library(ggplot2)

iqr_na_rm <- function(x) IQR(x, na.rm = TRUE)
median_na_rm <- function(x) median(x, na.rm = TRUE)

skim_with(numeric = list(p50 = NULL, median = median_na_rm, iqr = iqr_na_rm),
          integer = list(p50 = NULL, median = median_na_rm, iqr = iqr_na_rm))

msleep %>%
  group_by(vore) %>%
  skim(sleep_total) %>%
  kable()

```

Rendered HTML:

enter image description here

As you can see, median and iqr are printed and the end of the table, after the sparkline histogram. I'd like them to be printed after sd and before p0. Is it possible?

DeltaIV
  • 4,773
  • 12
  • 39
  • 86
  • 1
    You probably can, but not without fiddling with source code. I think if you catch the ordering around [here](https://github.com/ropensci/skimr/blob/master/R/skim_print.R#L92) you could perhaps get things to work. – Roman Luštrik Mar 18 '19 at 15:35
  • 1
    Did you try using append=FALSE and listing everything in the order you want? – Elin Mar 21 '19 at 02:45
  • Ciao @Elin ! No, I didn't try because I didn't know that was an option, but I was actually hoping for you or for Michael to chime in :-) can I suggest to add this option in a vignette? I think it could be useful. I might also try to make a PR. If you like the idea, I'll open a issue on the skimr repo. Of course, only if it's not there already - I could have missed it, or forgot about it. – DeltaIV Mar 21 '19 at 07:48
  • 1
    That's a great idea about the vignette. We have to update all of them for V2, and we'll add that as an example. – Elin Mar 21 '19 at 08:59

2 Answers2

3

There are two parts in the skim() output. If you want to control the numeric part, you can use skim_to_list like this. It's also easier to export in another format.

msleep %>%
  group_by(vore) %>%
  skim_to_list(sleep_total)%>%
  .[["numeric"]]%>%
  dplyr::select(vore,variable,missing,complete,n,mean,sd,
                median,iqr,p0,p25,p75,p100,hist)

# A tibble: 5 x 14
  vore    variable    missing complete n     mean    sd     median iqr     p0    p25    p75     p100   hist    
* <chr>   <chr>       <chr>   <chr>    <chr> <chr>   <chr>  <chr>  <chr>   <chr> <chr>  <chr>   <chr>  <chr>   
1 carni   sleep_total 0       19       19    10.38   4.67   10.4   " 6.75" 2.7   6.25   "13   " 19.4   ▃▇▂▇▆▃▂▃
2 herbi   sleep_total 0       32       32    " 9.51" 4.88   10.3   " 9.92" 1.9   "4.3 " 14.22   16.6   ▆▇▁▂▂▆▇▅
3 insecti sleep_total 0       5        5     14.94   5.92   18.1   "11.1 " 8.4   "8.6 " "19.7 " 19.9   ▇▁▁▁▁▁▃▇
4 omni    sleep_total 0       20       20    10.93   2.95   " 9.9" " 1.83" "8  " "9.1 " 10.93   "18  " ▆▇▂▁▁▁▁▂
5 NA      sleep_total 0       7        7     10.19   "3   " 10.6   " 3.5 " 5.4   8.65   12.15   13.7   ▃▃▁▁▃▇▁▇

EDIT

Adding kable() as requested in comment.

msleep %>%
  group_by(vore) %>%
  skim_to_list(sleep_total)%>%
  .[["numeric"]]%>%
  dplyr::select(vore,variable,missing,complete,n,mean,sd,median,iqr,p0,p25,p75,p100,hist)%>%
  kable()

|  vore   |  variable   | missing | complete | n  | mean  |  sd  | median | iqr  | p0  | p25  |  p75  | p100 |   hist   |
|---------|-------------|---------|----------|----|-------|------|--------|------|-----|------|-------|------|----------|
|  carni  | sleep_total |    0    |    19    | 19 | 10.38 | 4.67 |  10.4  | 6.75 | 2.7 | 6.25 |  13   | 19.4 | ▃▇▂▇▆▃▂▃ |
|  herbi  | sleep_total |    0    |    32    | 32 | 9.51  | 4.88 |  10.3  | 9.92 | 1.9 | 4.3  | 14.22 | 16.6 | ▆▇▁▂▂▆▇▅ |
| insecti | sleep_total |    0    |    5     | 5  | 14.94 | 5.92 |  18.1  | 11.1 | 8.4 | 8.6  | 19.7  | 19.9 | ▇▁▁▁▁▁▃▇ |
|  omni   | sleep_total |    0    |    20    | 20 | 10.93 | 2.95 |  9.9   | 1.83 |  8  | 9.1  | 10.93 |  18  | ▆▇▂▁▁▁▁▂ |
|   NA    | sleep_total |    0    |    7     | 7  | 10.19 |  3   |  10.6  | 3.5  | 5.4 | 8.65 | 12.15 | 13.7 | ▃▃▁▁▃▇▁▇ |
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56
  • I still want to create a `kable` table though, because this is for a RMarkdown report. I should have clarified that in the question, but `.Rmd` files and `reprex` don't play nice together. I edited my question to clarify. – DeltaIV Mar 18 '19 at 16:58
  • wow, I didn't think that just by adding `kable()` at the end of the pipe, your code would still work exactly the same way! Thanks – DeltaIV Mar 18 '19 at 17:06
1

Here's another option that uses the append=FALSE option.

library(skimr)
library(dplyr)
library(ggplot2)

iqr_na_rm <- function(x) IQR(x, na.rm = TRUE)
median_na_rm <- function(x) median(x, na.rm = TRUE)

my_skimmers <- list(n = length, missing = n_missing, complete = n_complete,
                     mean = mean.default, sd = purrr::partial(sd, na.rm = TRUE),
                     median = median_na_rm,  iqr = iqr_na_rm
                    )

skim_with(numeric = my_skimmers,
     integer = my_skimmers, append = FALSE)

msleep %>%
  group_by(vore) %>%
  skim(sleep_total) %>%
  kable()

I didn't put all the stats but you can look in the functions.R and stats.R files to see how the various statistics are defined.

Elin
  • 6,507
  • 3
  • 25
  • 47
  • 1
    This is nice! I still like the `skim_to_list()` option because it seems to me easier to combine with `select` in order to quickly change the list of skimmers in different parts of the code, but it's good to know that there's an official solution. – DeltaIV Mar 21 '19 at 07:51