2

Although this post showed that one could use backtick ` to select columns that had space in their names, I couldn't do that in the following code

library(tidyverse)
library(survival)

df <- colon[colon$etype==2, c("time", "age")]
num_summary <- do.call(cbind, lapply(df, summary)) %>% 
  t() %>% mutate(Interquatrile = `3rd Qu.` - `1st Qu.`)

Which would result in the following error

Error in UseMethod("mutate") : 
  no applicable method for 'mutate' applied to an object of class "c('matrix', 'array', 'double', 'numeric')"

Could you please explain what I did wrong and how to solve the problem without renaming the column names?

Nemo
  • 1,124
  • 2
  • 16
  • 39

3 Answers3

2

As mentioned in the other answer this has nothing to do with your column names and everything to do with the fact that cbind (as well as t) by default creates a matrix, not a data.frame.

To create a table you don’t need to go the detour via a matrix (and as_tibble) at all — instead, use bind_cols or, in your case (omitting the t()), bind_rows:

num_summary <- lapply(df, summary) %>%
  bind_rows() %>%
  mutate(Interquartile = `3rd Qu.` - `1st Qu.`)

The code above preserves the class and attributes of the summary table. This isn’t harmful, but if you want to get rid of extraneous attributes and just want to retain the bare numeric values, you can apply as.vector (or c) to all columns to achieve that:

num_summary <- lapply(df, summary) %>%
  bind_rows() %>%
  mutate(across(everything(), as.vector)) %>%
  mutate(Interquartile = `3rd Qu.` - `1st Qu.`)
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Awesome solutions. Could you please explain why we need `mutate(across(everything(), c))` before `mutate(Interquartile`? Also what did `c` in `mutate(across(everything(), c))` mean? Thanks. – Nemo Apr 11 '23 at 12:49
  • 1
    @Nemo LIke I said, we don't *need* it but if you wrap an object in a `c()` call, [it drops all attributes](https://stat.ethz.ch/R-manual/R-devel/library/base/html/c.html#:~:text=c%20is%20sometimes%20used%20for%20its%20side%20effect%20of%20removing%20attributes%20except%20names), so the columns become raw `numeric`s instead of summary table types. Actually using `as.vector` would have been more explicit, I'll amend the answer. – Konrad Rudolph Apr 11 '23 at 13:02
1

It has nothing to do with your columns but with your general data structure:

Use instead as_tibble() to transform a matrix to a tibble (dataframe) such that you can use the mutate function:

do.call(cbind, lapply(df, summary)) %>% 
  t() %>% 
  as_tibble() %>% 
  mutate(Interquatrile = `3rd Qu.` - `1st Qu.`)

Output:

# A tibble: 2 × 7
   Min. `1st Qu.` Median   Mean `3rd Qu.`  Max. Interquatrile
  <dbl>     <dbl>  <dbl>  <dbl>     <dbl> <dbl>         <dbl>
1    23       806   1976 1670.       2364  3329          1558
2    18        53     61   59.8        69    85            16
Julian
  • 6,586
  • 2
  • 9
  • 33
1

You are calling mutate with a matrix, but mutate needs a data.frame.

Maybe you use rbind instead of cbind and t and convert the matrix to a data.frame, that mutate can work with it.

do.call(rbind, lapply(df, summary)) %>% 
  as.data.frame %>% mutate(Interquatrile = `3rd Qu.` - `1st Qu.`)
#     Min. 1st Qu. Median       Mean 3rd Qu. Max. Interquatrile
#time   23     806   1976 1669.95587    2364 3329          1558
#age    18      53     61   59.75457      69   85            16

Or your code by using in addition as.data.frame.

do.call(cbind, lapply(df, summary)) %>% 
  t() %>% as.data.frame %>% mutate(Interquatrile = `3rd Qu.` - `1st Qu.`)
#     Min. 1st Qu. Median       Mean 3rd Qu. Max. Interquatrile
#time   23     806   1976 1669.95587    2364 3329          1558
#age    18      53     61   59.75457      69   85            16

Or skip the conversion to data.frame and cbind Interquatrile to the matrix.

do.call(rbind, lapply(df, summary)) %>%
  cbind(., Interquatrile = .[,"3rd Qu."] - .[,"1st Qu."])
#     Min. 1st Qu. Median       Mean 3rd Qu. Max. Interquatrile
#time   23     806   1976 1669.95587    2364 3329          1558
#age    18      53     61   59.75457      69   85            16

Or do it directly in the function called by lapply and get also a matrix, which could be converted to a data.frame if needed.

do.call(rbind,
        lapply(df, function(x) {
          y <- summary(x)
          c(y, Interquatrile = y[["3rd Qu."]] - y[["1st Qu."]])} ))
#     Min. 1st Qu. Median       Mean 3rd Qu. Max. Interquatrile
#time   23     806   1976 1669.95587    2364 3329          1558
#age    18      53     61   59.75457      69   85            16
GKi
  • 37,245
  • 2
  • 26
  • 48
  • Thank you for your comprehensive answer. Could you please explain what `\(x)` and `;` in the part `y <- summary(x);`? – Nemo Apr 11 '23 at 12:45
  • 1
    `\(x)` is a short notation of `function(x)` and `;` was needed when I wrote this in one line. I have cleaned this now. – GKi Apr 11 '23 at 12:48
  • 1
    It's difficult to decide yours or @Konrard Rudolph's solution as the accepted answer. I went for his as it's easier to understand. Thank you! – Nemo Apr 11 '23 at 13:02