0

I am attempting to write a function which loops through the columns and will perform different operations depending on the data type of the column. It is failing to enter the if statements since is.numeric(data[,i]) returns FALSE regardless of whether its corresponding is.numeric(data$variable) returns TRUE. I am not sure the best way to approach this problem. Please let me know if you can help. Thanks!

Here is the function:

get_summary_stats <- function(data) {
  results <- list()
  for (i in names(data)) {
    var.name <- names(data[,i])
    if (is.numeric(data[,i])) {
      med.est <- median(data[,i])
      min.est <- min(data[,i])
      max.est <- max(data[,i])
      mean.est <- mean(data[,i])
      SD <- sd(data[,i])
      num.na <- sum(is.na(data[,i]))
      
      results[[i]] <- c(var.name, num.na, mean.est, SD, med.est, min.est, max.est)

    }
    if (is.factor(data[,i])){
      var.lables <- levels
      counts <-  as.data.frame(table(data[,i]))
      total <- sum(counts$Freq)
      num.na <- c("NA", nrow(data) - total)
      counts <- rbind(counts, num.na)
      counts$Percent <- (counts$Freq / total) * 100
      
      results[[i]] <- counts
    }
  }
  
  return(results)
}

Here is an example of the issue:

> is.numeric(full_data[,"Patient Age [70: Age]"])
[1] FALSE
> is.numeric(full_data$`Patient Age [70: Age]`)
[1] TRUE
  • To summarize the duplicate as it applies here, `data[, column]` *can* return a data frame, not a column vector. The equivalent to `$` when using a string column name is `data[[column]]`, which will always return a column vector, not a data frame. – Gregor Thomas Jun 10 '21 at 16:30
  • data[[column]] worked thank you! – Daniel Rodriguez Jun 10 '21 at 17:40

0 Answers0