6

I want to use rmarkdown to make a table where each cell has two values, for example 3.1 (0.05) or 78 ± 23.3. These kinds of tables are pretty common in scientific literature (like ones with bold values), where we want to compactly show mean and standard deviation, or a value plus-minus some error term. So it would be useful to have a simple way to produce them when using Rmarkdown. For example:

# my table
mtcars

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
[snipped]

# my other table, that I want to combine with the first
some_error_term_for_mtcars <- data.frame(sapply(1:ncol(mtcars), function(i) sample(x = (min(mtcars[, i])/10):max(mtcars[, i])/10, nrow(mtcars), replace = TRUE)))

some_error_term_for_mtcars
      X1   X2     X3    X4     X5      X6    X7  X8  X9  X10  X11
1  2.704 0.44 26.011  3.92 0.4276 0.21513 1.145 0.0 0.0 0.03 0.41
2  0.604 0.44  5.211  6.32 0.0276 0.01513 1.345 0.1 0.1 0.33 0.21
3  3.304 0.14 31.511 20.42 0.1276 0.51513 0.145 0.1 0.0 0.43 0.71
4  1.004 0.44 16.011 26.02 0.2276 0.11513 1.345 0.1 0.0 0.03 0.31
5  2.604 0.34  4.311 30.02 0.0276 0.31513 1.745 0.1 0.1 0.23 0.41
6  2.404 0.64  8.011 27.92 0.1276 0.21513 1.145 0.0 0.1 0.33 0.41
7  2.804 0.14  4.811 14.92 0.1276 0.01513 0.345 0.1 0.0 0.13 0.31
[snipped]

What is the simplest way to combine these two tables in rmarkdown to produce a single where a single cells can contain things like 21 (0.904) or 21 ± 0.904?

Community
  • 1
  • 1
Ben
  • 41,615
  • 18
  • 132
  • 227
  • 1
    This isn't necessary a Rmarkdown problem. The result can be used in markdown OK, but that can be said for almost all questions... – RHA Jan 14 '16 at 07:11
  • 1
    [Related](http://stackoverflow.com/questions/34643279/better-faster-way-to-concatenate-two-columns-after-removing-duplicate-values-in) not exact duplicate though! – RHA Jan 14 '16 at 07:17
  • @RHA yes, fair comment. I was wondering if there might be a simple method using pander at the point where the markdown is generated. – Ben Jan 14 '16 at 09:28
  • Oh, I do understand what you are aiming at, and what the relation with markdown is (been struggling lately with `knitr/markdown` myself and switched to `reporteRs`). I just think as it stand now, the relation with markdown is rather thin. – RHA Jan 14 '16 at 09:44
  • @Ben, I'm a bit confused with where the error terms data come from. Is this a merging/uniting problem? Or rendering problem? – Emman Jul 25 '21 at 18:06
  • 1
    @Emman The error terms are just small values I computed from the data (the code is shown in the question) to provide a realistic example. I guess I'm presenting it here as a merging/uniting problem, but I could imaging a solution could come from a table rendering approach. – Ben Jul 26 '21 at 23:06
  • @Ben, so if I understand you correctly, you want to mimic the functionality of `tidyr::unite()`, but over data from different data frames (but same number of rows and columns). – Emman Jul 27 '21 at 06:48

3 Answers3

8

We could do it like this, and then use knitr::kable to get the markdown:

two_tables_into_one <- as.data.frame(do.call(cbind, lapply(1:ncol(mtcars), function(i) paste0(mtcars[ , i], " (", some_error_term_for_mtcars[ , i], ")"  ) )))
names(two_tables_into_one) <- names(mtcars)
head(two_tables_into_one)
           mpg      cyl         disp          hp          drat              wt          qsec      vs
1   21 (2.704) 6 (0.44) 160 (26.011)  110 (3.92)  3.9 (0.4276)  2.62 (0.21513) 16.46 (1.145)   0 (0)
2   21 (0.604) 6 (0.44)  160 (5.211)  110 (6.32)  3.9 (0.0276) 2.875 (0.01513) 17.02 (1.345) 0 (0.1)
3 22.8 (3.304) 4 (0.14) 108 (31.511)  93 (20.42) 3.85 (0.1276)  2.32 (0.51513) 18.61 (0.145) 1 (0.1)
4 21.4 (1.004) 6 (0.44) 258 (16.011) 110 (26.02) 3.08 (0.2276) 3.215 (0.11513) 19.44 (1.345) 1 (0.1)
5 18.7 (2.604) 8 (0.34)  360 (4.311) 175 (30.02) 3.15 (0.0276)  3.44 (0.31513) 17.02 (1.745) 0 (0.1)
6 18.1 (2.404) 6 (0.64)  225 (8.011) 105 (27.92) 2.76 (0.1276)  3.46 (0.21513) 20.22 (1.145)   1 (0)
       am     gear     carb
1   1 (0) 4 (0.03) 4 (0.41)
2 1 (0.1) 4 (0.33) 4 (0.21)
3   1 (0) 4 (0.43) 1 (0.71)
4   0 (0) 3 (0.03) 1 (0.31)
5 0 (0.1) 3 (0.23) 2 (0.41)
6 0 (0.1) 3 (0.33) 1 (0.41)

knitr::kable(head(two_tables_into_one))

enter image description here

or for a plus-minus separator:

two_tables_into_one <- as.data.frame(do.call(cbind, lapply(1:ncol(mtcars), function(i) paste0(mtcars[ , i], " ± ", some_error_term_for_mtcars[ , i]  ) )))
names(two_tables_into_one) <- names(mtcars)
head(two_tables_into_one)
           mpg      cyl         disp          hp
1   21 ± 2.704 6 ± 0.44 160 ± 26.011  110 ± 3.92
2   21 ± 0.604 6 ± 0.44  160 ± 5.211  110 ± 6.32
3 22.8 ± 3.304 4 ± 0.14 108 ± 31.511  93 ± 20.42
4 21.4 ± 1.004 6 ± 0.44 258 ± 16.011 110 ± 26.02
5 18.7 ± 2.604 8 ± 0.34  360 ± 4.311 175 ± 30.02
6 18.1 ± 2.404 6 ± 0.64  225 ± 8.011 105 ± 27.92
           drat              wt          qsec
1  3.9 ± 0.4276  2.62 ± 0.21513 16.46 ± 1.145
2  3.9 ± 0.0276 2.875 ± 0.01513 17.02 ± 1.345
3 3.85 ± 0.1276  2.32 ± 0.51513 18.61 ± 0.145
4 3.08 ± 0.2276 3.215 ± 0.11513 19.44 ± 1.345
5 3.15 ± 0.0276  3.44 ± 0.31513 17.02 ± 1.745
6 2.76 ± 0.1276  3.46 ± 0.21513 20.22 ± 1.145
       vs      am     gear     carb
1   0 ± 0   1 ± 0 4 ± 0.03 4 ± 0.41
2 0 ± 0.1 1 ± 0.1 4 ± 0.33 4 ± 0.21
3 1 ± 0.1   1 ± 0 4 ± 0.43 1 ± 0.71
4 1 ± 0.1   0 ± 0 3 ± 0.03 1 ± 0.31
5 0 ± 0.1 0 ± 0.1 3 ± 0.23 2 ± 0.41
6   1 ± 0 0 ± 0.1 3 ± 0.33 1 ± 0.41

knitr::kable(head(two_tables_into_one))

enter image description here

But this as.data.frame(do.call(cbind, lapply... seems a bit awkward. Is there a neater way?

Ben
  • 41,615
  • 18
  • 132
  • 227
  • 2
    `tidyr::unite`, but you'd still have to `paste0` on the closing parentheses. Better, though? – alistaire Jan 14 '16 at 04:52
  • 1
    Are you asking a new question in an answer? –  Jan 14 '16 at 05:03
  • @Pascal I'm inviting better answers. @alistaire, would you mind to elaborate a little? It's not obvious to me how `unite` can work on two tables. – Ben Jan 14 '16 at 05:05
  • 1
    two_tables_into_one <- as.data.frame(sapply(1:ncol(mtcars), function(i) paste0(mtcars[ , i], " (", some_error_term_for_mtcars[ , i], ")" ) )) also works – torpzorrr Jul 28 '21 at 20:07
3

I used the following technique in my summarytools package (you can look at the source code for descr() and print.summarytools() to get all the details).

> install.packages("devtools")
> library(devtools)
> install_github('dcomtois/summarytools')
> library(summarytools)
> obs <- descr(iris)$observ
> obs
      Sepal.Length Sepal.Width  Petal.Length Petal.Width 
Valid "150 (100%)" "150 (100%)" "150 (100%)" "150 (100%)"
<NA>  "0 (0%)"     "0 (0%)"     "0 (0%)"     "0 (0%)"    
Total "150 (100%)" "150 (100%)" "150 (100%)" "150 (100%)"

The $observ dataframe has been constructed this way - it's part of a bigger loop, hence the i iterator. Note that the dataframe is transposed later on in the code.

output$observ[i,] <- c(paste0(n.valid, " (", p.valid, "%)"),
                       paste0(n.NA, " (", p.NA, "%)"),
                       paste(n.valid + n.NA, "(100%)"))

Then for generating an rmarkdown table using pander, we can simply do this:

> library(pander)
> pander(x = obs, style="rmarkdown")    


|   &nbsp;    |  Sepal.Length  |  Sepal.Width  |  Petal.Length  |
|:-----------:|:--------------:|:-------------:|:--------------:|
|  **Valid**  |   150 (100%)   |  150 (100%)   |   150 (100%)   |
|  **<NA>**   |     0 (0%)     |    0 (0%)     |     0 (0%)     |
|  **Total**  |   150 (100%)   |  150 (100%)   |   150 (100%)   |

Table: Table continues below


|   &nbsp;    |  Petal.Width  |
|:-----------:|:-------------:|
|  **Valid**  |  150 (100%)   |
|  **<NA>**   |    0 (0%)     |
|  **Total**  |  150 (100%)   |

Here's the full output for the descr() function:

> descr(iris, style = "rmarkdown", plain.ascii = FALSE)
Non-numerical variable(s) ignored: Species

Descriptive Statistics

Dataframe: iris

|            &nbsp; |   Sepal.Length |   Sepal.Width |   Petal.Length |   Petal.Width |
|------------------:|---------------:|--------------:|---------------:|--------------:|
|          **Mean** |           5.84 |          3.06 |           3.76 |           1.2 |
|       **Std.Dev** |           0.83 |          0.44 |           1.77 |          0.76 |
|           **Min** |            4.3 |             2 |              1 |           0.1 |
|           **Max** |            7.9 |           4.4 |            6.9 |           2.5 |
|        **Median** |            5.8 |             3 |           4.35 |           1.3 |
|           **mad** |           1.04 |          0.44 |           1.85 |          1.04 |
|           **IQR** |            1.3 |           0.5 |            3.5 |           1.5 |
|            **CV** |           7.06 |          7.01 |           2.13 |          1.57 |
|      **Skewness** |           0.31 |          0.31 |          -0.27 |          -0.1 |
|   **SE.Skewness** |            0.2 |           0.2 |            0.2 |           0.2 |
|      **Kurtosis** |          -0.61 |          0.14 |          -1.42 |         -1.36 |

Observations

|      &nbsp; |   Sepal.Length |   Sepal.Width |   Petal.Length |   Petal.Width |
|------------:|---------------:|--------------:|---------------:|--------------:|
|   **Valid** |     150 (100%) |    150 (100%) |     150 (100%) |    150 (100%) |
|    **<NA>** |         0 (0%) |        0 (0%) |         0 (0%) |        0 (0%) |
|   **Total** |     150 (100%) |    150 (100%) |     150 (100%) |    150 (100%) |

Now for combining data from 2 distinct datasets, a good old for loop can very well do the job:

names(some_error_term_for_mtcars) <- names(mtcars)
new.df <- mtcars
for (n in names(mtcars)) {
  new.df[,n] <- paste(mtcars[,n], "±",round(some_error_term_for_mtcars[,n],2))
}
pander(new.df, style="rmarkdown")

Partial output:

|          &nbsp;           |    mpg     |   cyl    |     disp      |
|:-------------------------:|:----------:|:--------:|:-------------:|
|       **Mazda RX4**       |   21 ± 2   | 6 ± 0.04 |  160 ± 33.61  |
|     **Mazda RX4 Wag**     |  21 ± 0.8  | 6 ± 0.14 |  160 ± 26.11  |
|      **Datsun 710**       | 22.8 ± 0.1 | 4 ± 0.64 |  108 ± 45.81  |
|    **Hornet 4 Drive**     | 21.4 ± 1.7 | 6 ± 0.04 |  258 ± 33.81  |
|   **Hornet Sportabout**   | 18.7 ± 2.7 | 8 ± 0.54 |  360 ± 37.81  |
|        **Valiant**        | 18.1 ± 3.3 | 6 ± 0.14 |  225 ± 36.31  |
|      **Duster 360**       | 14.3 ± 0.1 | 8 ± 0.24 |  360 ± 2.01   |
|       **Merc 240D**       | 24.4 ± 2.3 | 4 ± 0.14 | 146.7 ± 8.81  |
|       **Merc 230**        | 22.8 ± 1.7 | 4 ± 0.04 | 140.8 ± 43.91 |
|       **Merc 280**        | 19.2 ± 1.5 | 6 ± 0.24 | 167.6 ± 6.91  |
|       **Merc 280C**       |  17.8 ± 3  | 6 ± 0.14 | 167.6 ± 27.11 |
|      **Merc 450SE**       |  16.4 ± 3  | 8 ± 0.34 | 275.8 ± 11.21 |
|      **Merc 450SL**       | 17.3 ± 2.8 | 8 ± 0.14 | 275.8 ± 32.21 |
|      **Merc 450SLC**      | 15.2 ± 0.3 | 8 ± 0.44 | 275.8 ± 11.61 |
Dominic Comtois
  • 10,230
  • 1
  • 39
  • 61
  • Can you show the full details of how your loop method can be used to combine two tables? – Ben Jan 16 '16 at 11:09
  • ok, I see you use a loop where I use lapply. Quite a reasonable alternative, thanks. – Ben Jan 18 '16 at 11:10
3

The following solution is based on a spin-off question I asked in order to answer this one.

Essentially, the problem needs to be broken down to 2 parts: first, how to combine two tables, and second, how to render the result to HTML etc.

Let's demonstrate the solution with two dataframes:

  • my_mtcars
  • df_random_vals

library(tibble)
library(dplyr, warn.conflicts = FALSE)


## part 1 -- create `my_mtcars`
##############################
my_mtcars <- 
  mtcars %>%
  rownames_to_column("cars") %>%
  as_tibble()


## part 2 -- create `df_random_vals` based on `my_mtcars` dimensions
####################################################################
dim_my_mtcars <- dim(my_mtcars)
target_nrows  <- dim_my_mtcars[1]
target_ncols  <- dim_my_mtcars[2]

set.seed(2021)

my_mat <-
  matrix(data = rnorm(target_nrows*target_ncols), 
         ncol = target_ncols, 
         nrow = target_nrows) 

df_random_vals <-
  my_mat %>%
  as.data.frame() %>%
  as_tibble() %>%
  mutate(across(everything(), round, 3)) ## just so we have shorter decimals

## part 3 -- test `my_mtcars` and `df_random_vals` are of the same dimensions as intended
#########################################################################################
identical(
  dim(df_random_vals), 
  dim(my_mtcars)
)
#> [1] TRUE

## part 4 -- set a general custom function for how to paste values together
#################################################################
my_paste <- function(x, y) {
  paste0(x, " ± ", y)
}

my_paste(1, 2)
#> [1] "1 ± 2"

## part 5 -- join the datasets
##############################
library(purrr)
output <- map2_dfr(my_mtcars, df_random_vals, my_paste) # https://stackoverflow.com/a/68541960/6105259

output
#> # A tibble: 32 x 12
#>    cars    mpg    cyl    disp   hp     drat  wt    qsec  vs    am    gear  carb 
#>    <chr>   <chr>  <chr>  <chr>  <chr>  <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#>  1 Mazda ~ 21 ± ~ 6 ± 0~ 160 ±~ 110 ±~ 3.9 ~ 2.62~ 16.4~ 0 ± ~ 1 ± ~ 4 ± ~ 4 ± ~
#>  2 Mazda ~ 21 ± ~ 6 ± -~ 160 ±~ 110 ±~ 3.9 ~ 2.87~ 17.0~ 0 ± ~ 1 ± ~ 4 ± ~ 4 ± ~
#>  3 Datsun~ 22.8 ~ 4 ± 0~ 108 ±~ 93 ± ~ 3.85~ 2.32~ 18.6~ 1 ± ~ 1 ± ~ 4 ± ~ 1 ± ~
#>  4 Hornet~ 21.4 ~ 6 ± 1~ 258 ±~ 110 ±~ 3.08~ 3.21~ 19.4~ 1 ± ~ 0 ± ~ 3 ± ~ 1 ± ~
#>  5 Hornet~ 18.7 ~ 8 ± -~ 360 ±~ 175 ±~ 3.15~ 3.44~ 17.0~ 0 ± ~ 0 ± ~ 3 ± ~ 2 ± ~
#>  6 Valian~ 18.1 ~ 6 ± -~ 225 ±~ 105 ±~ 2.76~ 3.46~ 20.2~ 1 ± ~ 0 ± ~ 3 ± ~ 1 ± ~
#>  7 Duster~ 14.3 ~ 8 ± -~ 360 ±~ 245 ±~ 3.21~ 3.57~ 15.8~ 0 ± ~ 0 ± ~ 3 ± 0 4 ± ~
#>  8 Merc 2~ 24.4 ~ 4 ± -~ 146.7~ 62 ± ~ 3.69~ 3.19~ 20 ±~ 1 ± ~ 0 ± ~ 4 ± ~ 2 ± ~
#>  9 Merc 2~ 22.8 ~ 4 ± -~ 140.8~ 95 ± ~ 3.92~ 3.15~ 22.9~ 1 ± ~ 0 ± ~ 4 ± ~ 2 ± ~
#> 10 Merc 2~ 19.2 ~ 6 ± 1~ 167.6~ 123 ±~ 3.92~ 3.44~ 18.3~ 1 ± ~ 0 ± ~ 4 ± ~ 4 ± ~
#> # ... with 22 more rows

Created on 2021-07-27 by the reprex package (v2.0.0)


And finally, we render:

library(kableExtra)
## based on this one: https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html#Bootstrap_theme
output %>%
  kbl() %>%
  kable_styling()

In sum

The scenario
We have two data frames that correspond to each other. That is, they have the same dimensions, and values in cells of the same location are inherently related between the tables. One example is one table with means, and another table with standard error of each mean.

The desired operation
We want to have one table instead of two, which brings together (AKA "concatenates") respective cells into one cell.

The desired output
We want to render the single output table as HTML.

The necessary code

  • step 1: define pasting function
      my_paste <- function(x, y) {
          paste0(x, " ± ", y)
      }
    
  • step 2: "merge" the tables to one based on the pasting function
      output <- map2_dfr(df1, df2, my_paste)
    
  • step 3: render as HTML
    output %>%
       kbl() %>%
       kable_styling()
    
Emman
  • 3,695
  • 2
  • 20
  • 44