6

I am trying to summarise the mean, sd etc for a number of different columns (variables) in my dataset. I have coded my own summarise function to return exactly what I need and am using sapply to apply this function to all the variables at once. It works fine, however the dataframe that is returned has no column names and I cannot seem to even rename them using a column number reference - aka they seem impossible to use in any way.

My code is below- as I am just finding summary statistics, I would like to just keen the same column (variable) names, with 4 rows (mean, sd, min, max). Is there any way at all to do this (even a slow way where I manually change the names of the columns)

 #GENERATING DESCRIPTIVE STATISTICS
sfsum= function(x){
  mean=mean(x)
  sd=sd(x)
  min=min(x)
  max=max(x)
  
  return(c(mean,sd,min,max))
}

#
c= list(sfbalanced$age_child, sfbalanced$earnings_child, 
sfbalanced$logchildinc ,sfbalanced$p_inc84, sfbalanced$login84, 
sfbalanced$p_inc85, sfbalanced$login85, sfbalanced$p_inc86, 
sfbalanced$login86, sfbalanced$p_inc87, sfbalanced$login87, 
sfbalanced$p_inc88, sfbalanced$login88)

summ=sapply(c,sfsum)

names(summ)
 NULL
starball
  • 20,030
  • 7
  • 43
  • 238
  • set the names of `c` with the column names. Also, `c` is a function name, so it is better not to use that as an object name. I think `sapply(sfbalanced, sfsum)` should get the output – akrun May 22 '18 at 15:49
  • 1
    `return(c(mean=mean,sd=sd,min=min,max=max))` – Jilber Urbina May 22 '18 at 15:50

2 Answers2

5

If you provide names in return during the function definition, you can have rownames as function names, if you provide names of lists while defining your object then you can use USE.NAMES in sapply to get the names automatically.

An example on mtcars data can give you following output.

Code

sfsum= function(x){
    mean=mean(x)
    sd=sd(x)
    min=min(x)
    max=max(x)

    return(c("mean"=mean,"sd"=sd,"min" = min,"max" =max)) #For rownames
}

#
x= list("mpg" = mtcars$mpg, "disp" = mtcars$disp, "drat" = mtcars$drat)
#For column names

summ=sapply(x,sfsum, USE.NAMES = TRUE) #USE.NAMES = TRUE to get names on top

Output:

> summ
           mpg     disp      drat
mean 20.090625 230.7219 3.5965625
sd    6.026948 123.9387 0.5346787
min  10.400000  71.1000 2.7600000
max  33.900000 472.0000 4.9300000
PKumar
  • 10,971
  • 6
  • 37
  • 52
  • 1
    `USE.NAMES = TRUE` is a default for `sapply`. The key is to return a named vector from the function. – zx8754 May 22 '18 at 20:45
  • And dataframe is already a list `x <- mtcars[, c("mpg", "disp", "drat")]` – zx8754 May 22 '18 at 20:46
  • 1
    @zx8754, I know , I just want to carry the same approach as OP to let him/her know where its coming from. Anyways thanks for the valuable inputs. – PKumar May 23 '18 at 00:05
2

If we need to have the column names as well, just loop through the dataset (assuming that we are applying the function on all the columns)

out <- sapply(df2, sfsum)
row.names(out) <- c('mean', 'sd', 'min', 'max')

data

set.seed(24)
df2 <- as.data.frame(matrix(rnorm(4*4), 4, 4))
akrun
  • 874,273
  • 37
  • 540
  • 662