1

I am using the function aggregate that to get several statistics at the same time:

temp<-aggregate(AUClast~RIC+STUD, nca_sim[!is.na(nca_sim$RIC),], 
      FUN= function(x) 
        c(N=length(x), 
          Mean=mean(x), 
          SD=sd(x),
          Median=median(x),
          Min=min(x),
          Max=max(x)
          ))

If I print the output, I see 8 columns (two row/ID names and 6 statistics). But aggregate output contains only 3 columns (two ID columns and all the statistics in one column). So, when I want to assign name to each column, I get this error message:

colnames(temp) <-c("Renal Function", "Study", "N", "Mean", "SD", "Median", "Min", "Max")
Error in names(x) <- value : 
  'names' attribute [8] must be the same length as the vector [3]

So, how do I assign the 8 names to 8 columns? Just for clarity, I need to do this so that I can pass it to "kable" to produce formatted table with my assigned column names.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
CodeMaster
  • 63
  • 4

1 Answers1

0

Here is a solution. The main point is that the last column is a matrix column, not several vectors, one per statistic computed in aggregate.

Note that the way to extract the results matrix is absolutely general, this column is always the last one and therefore it's column number is always ncol(temp).

The following post is relevant: Difference between [ and [[

# make up a test data set
df1 <- mtcars[c("mpg", "cyl", "am")]
# it's not strictly needed to coerce the 
# grouping columns to factor
df1$cyl <- factor(df1$cyl)
df1$am <- factor(df1$am)
names(df1) <- c("AUClast", "RIC", "STUD")

# exact same code as the question's
temp <- aggregate(AUClast ~ RIC + STUD, df1, 
                  FUN = function(x) 
                    c(N = length(x), 
                      Mean = mean(x), 
                      SD = sd(x),
                      Median = median(x),
                      Min = min(x),
                      Max = max(x)
                    ))
# see the result
temp
#>   RIC STUD  AUClast.N AUClast.Mean AUClast.SD AUClast.Median AUClast.Min
#> 1   4    0  3.0000000   22.9000000  1.4525839     22.8000000  21.5000000
#> 2   6    0  4.0000000   19.1250000  1.6317169     18.6500000  17.8000000
#> 3   8    0 12.0000000   15.0500000  2.7743959     15.2000000  10.4000000
#> 4   4    1  8.0000000   28.0750000  4.4838599     28.8500000  21.4000000
#> 5   6    1  3.0000000   20.5666667  0.7505553     21.0000000  19.7000000
#> 6   8    1  2.0000000   15.4000000  0.5656854     15.4000000  15.0000000
#>   AUClast.Max
#> 1  24.4000000
#> 2  21.4000000
#> 3  19.2000000
#> 4  33.9000000
#> 5  21.0000000
#> 6  15.8000000

# the error
colnames(temp) <-c("Renal Function", "Study", "N", "Mean", "SD", "Median", "Min", "Max")
#> Error in names(x) <- value: 'names' attribute [8] must be the same length as the vector [3]

# only three columns, the last one is a matrix
# this matrix is the output of the anonymous function
# aggregate applies to the data. 
str(temp)
#> 'data.frame':    6 obs. of  3 variables:
#>  $ RIC    : Factor w/ 3 levels "4","6","8": 1 2 3 1 2 3
#>  $ STUD   : Factor w/ 2 levels "0","1": 1 1 1 2 2 2
#>  $ AUClast: num [1:6, 1:6] 3 4 12 8 3 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : NULL
#>   .. ..$ : chr [1:6] "N" "Mean" "SD" "Median" ...

# three columns
(nc <- ncol(temp))
#> [1] 3

# extract the 3rd column temp[[nc]], not the data.frame temp[nc]
# and bind it with the other columns. It is also important 
# to notice that the method called is cbind.data.frame,
# since temp[-nc] extracts a data.frame. See the link above.
temp <- cbind(temp[-nc], temp[[nc]])

# not needed anymore!
# colnames(temp) <-c("Renal Function", "Study", "N", "Mean", "SD", "Median", "Min", "Max")

temp
#>   RIC STUD  N     Mean        SD Median  Min  Max
#> 1   4    0  3 22.90000 1.4525839  22.80 21.5 24.4
#> 2   6    0  4 19.12500 1.6317169  18.65 17.8 21.4
#> 3   8    0 12 15.05000 2.7743959  15.20 10.4 19.2
#> 4   4    1  8 28.07500 4.4838599  28.85 21.4 33.9
#> 5   6    1  3 20.56667 0.7505553  21.00 19.7 21.0
#> 6   8    1  2 15.40000 0.5656854  15.40 15.0 15.8

Created on 2023-05-06 with reprex v2.0.2

The assignment of new colnames is only needed to make the first two columns' names more descriptive, not to get rid of the AUClast. prefix.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Yes, many thanks. Your code is better than similar solution I just implemented as below: `cuttemp1<-temp$AUClast newtemp1<-cbind(temp, cuttemp1) newtemp2<-select(newtemp, -3) colnames(newtemp2) <-c("Renal Function", "Study", "N", "Mean", "SD", "Median", "Min", "Max")` – CodeMaster May 06 '23 at 04:48