0

I am trying to create a table with factor and numeric variables using modelsummary. The way I am doing this is by converting factor variables to numeric so that only 1 line appears for each factor variable and all variables appear in the same column. Then, I will manually calculate the number of units for each level of each previously factor/now numeric variable and assign this as text to each variable in my dataset. I am trying to do this as per the function called N_alt in the example below:

library(modelsummary)
library(kableExtra)

tmp <- mtcars[, c("mpg", "hp")]

tmp$class <- 0
tmp$class[15:32] <- 1
tmp$class <- as.factor(tmp$class)

tmp$region <- 1
tmp$region[15:20] <- 2
tmp$region[21:32] <- 3
tmp$region <- as.factor(tmp$region)

tmp$class <- 0
tmp$region <- 0

N_alt = function(x) {
  if (x %in% c(tmp$class)) {
    paste0('[14 (43.8); 18 (56.3)]') 
  } else if (x %in% c(tmp$region)) {
    paste0('[14 (43.8); 6 (18.8); 12 (37.5)]')  
  } else {
    paste0('[32 (100)]')
  }
}


# create a table with `datasummary`
emptycol = function(x) " "
datasummary(mpg + (`class [0,1]`= class) + (`region [A,B,C]`= region) + hp ~ Heading("N (%)") * N_alt, data = tmp)

which gives me: enter image description here

My N_alt function does not work properly. class is correct, but region is not. I am not getting any warning messages.

I have also tried:

N_alt = function(x) {
  if (x[1] %in% c(tmp$class)) {
    paste0('[14 (43.8); 18 (56.3)]') 
  } else if (x[1] %in% c(tmp$region)) {
    paste0('[14 (43.8); 6 (18.8); 12 (37.5)]')  
  } else {
    paste0('[32 (100)]')
  }
}

but I obtained the same output. I have created similar functions with these vectors and they worked fine, but this one for some reason it is not working.

Additionally, I also tried:

N_alt <- c('[32 (100)]','[14 (43.8); 18 (56.3)]','[14 (43.8); 6 (18.8); 12 (37.5)]','[32 (100)]')

and

N_alt <- c(rep('[32 (100)]',32),rep('[14 (43.8); 18 (56.3)]',32),rep('[14 (43.8); 6 (18.8); 12 (37.5)]',32),rep('[32 (100)]',32))

but I get:

Error in datasummary(mpg + (`class [0,1]` = class) + (`region [A,B,C]` = region) +  : 
  Argument 'N_alt' is not length 32

Does anyone know what I am missing here?

Edit:

It seems to be possible to run functions just as the below Mean_alt so that certain numeric variables do not have decimal places (just converting them to as.integer did not work for me) and previously factor/now numeric variables do not show any results for Mean in the table (two different actions), as per the below:

library(modelsummary)
library(kableExtra)

tmp <- mtcars[, c("mpg", "hp")]

tmp$class <- 0
tmp$class[15:32] <- 1
tmp$class <- as.factor(tmp$class)

tmp$region <- 1
tmp$region[15:20] <- 2
tmp$region[21:32] <- 3
tmp$region <- as.factor(tmp$region)

tmp$class <- 0
tmp$region <- 0

N_alt = function(x) {
  if (x %in% c(tmp$class)) {
    paste0('[14 (43.8); 18 (56.3)]') 
  } else if (x %in% c(tmp$region)) {
    paste0('[14 (43.8); 6 (18.8); 12 (37.5)]')  
  } else {
    paste0('[32 (100)]')
  }
}

Mean_alt = function(x) {
  if (x %in% c(tmp$mpg)) {
    as.character(floor(mean(x)), length=5)
  } else if (x %in% c(tmp$class, tmp$region)) {
    paste0("")
  } else {
    mean(x)
  }
}

# create a table with `datasummary`
emptycol = function(x) " "
datasummary(mpg + (`class [0,1]`= class) + (`region [A,B,C]`= region) + hp ~ Heading("N (%)") * N_alt + Heading("Mean") * Mean_alt, data = tmp)

output: enter image description here

Vincent
  • 15,809
  • 7
  • 37
  • 39
  • Next time, could you please supply a [truly MINIMAL working example?](https://reprex.tidyverse.org/articles/reprex-dos-and-donts.html) For example. are the `kableExtra` boxplots relevant to this question? I'm not sure if they are, and there is no explanation, so it's hard for me to know what to focus on in this large chunk of code. – Vincent Nov 07 '21 at 13:07
  • Sorry! Thanks Vincent for looking into this. I've just edited my question. – Daniela Rodrigues Nov 07 '21 at 14:27

1 Answers1

1

You are running against three limitations.

The first limitation is in Base R:

  1. As explained in the R manual, the statements in an if/else must evaluate to a single TRUE or FALSE. Internally, datasummary will apply the N_alt to each variable one after the other. Each time, N_alt receives a new vector of length 32. Frankly, I don’t think it makes much sense to check the value of the first element of that vector; I don’t see how this can get us where we want to go.

The two other limitations have to do with the fundamental design of the tables package, on which modelsummary::datasummary is based:

  1. Factors will always generate one row per factor level.
  2. I don’t think there is a good way to tell datasummary that a function should behave differently when applied to different numeric variables. This is because each function only sees the raw numeric vector, and not other meta-information.

I think the easiest workaround is to create two tables, one for your factors and one for your numeric. Then, these tables can easily be combined:

library(modelsummary)

N_factor <- function(x) {
  count <- table(x)
  pct <- prop.table(count)
  out <- paste(sprintf("%.0f (%.1f)", count, pct), collapse = "; ")
  sprintf("[%s]", out)
}

N_numeric <- function(x) {
  sprintf("%s (100)", length(x))
}

tab_fac <- datasummary(cyl + gear ~ Heading("N") * N_factor, 
                       output = "data.frame",
                       data = mtcars)

datasummary(mpg + hp ~ Heading("N") * N_numeric, 
            add_rows = tab_fac,
            data = mtcars)
N
mpg 32 (100)
hp 32 (100)
cyl [11 (0.3); 7 (0.2); 14 (0.4)]
gear [15 (0.5); 12 (0.4); 5 (0.2)]
Vincent
  • 15,809
  • 7
  • 37
  • 39
  • Thanks Vincent. I've tried to restrict it to x[1] but I get no error messages and the output is just the same. I've created similar "if" cycles for Mean, Min, etc and they worked fine. For some reason, this doesn't. Do you have any other ideas of what might be going on? – Daniela Rodrigues Nov 07 '21 at 14:33
  • I don't understand the logic of what you are trying to do, and your post includes no explanation of that logic. – Vincent Nov 07 '21 at 15:43
  • Apologies if it is not clear. I've edited my question to include in the first paragraph the description of what I am trying to do. Does it make sense? – Daniela Rodrigues Nov 07 '21 at 15:53
  • Thanks so much Vincent. Your solution works, my problem is that I would like to position the factor and numeric variables according to some grouping, not all numeric first and then all factors at the end. Re limitation nr 3, I seem to be able to run Mean_alt (I've edited my question) that allows different actions for different numeric variables, hence why I am so surprise N_alt does not work. – Daniela Rodrigues Nov 07 '21 at 18:08
  • I still don't understand the logic of your `if/else` call in `N_alt`, so I can't help with that. But choosing the location of the `add_rows` can be done easily by setting a `position` attribute: https://vincentarelbundock.github.io/modelsummary/articles/modelsummary.html#add-rows – Vincent Nov 07 '21 at 19:18
  • Thanks a lot Vincent. It works for this example, so hopefully it will work for my own dataset. Last q, how to bring back the % within brackets as [11 (34.4); 7 (21.9) (0.2); 14 (43.8)] instead of how it is currently being displayed in your solution [11 (0.3); 7 (0.2); 14 (0.4)]? – Daniela Rodrigues Nov 07 '21 at 20:39
  • 1
    great! Change the `sprintf` call in my function. – Vincent Nov 07 '21 at 21:18
  • I've tried many options without success, is it something easy to tell? – Daniela Rodrigues Nov 07 '21 at 21:34
  • 1
    https://stackoverflow.com/a/64146458/342331 – Vincent Nov 07 '21 at 21:43