1

For this particular dataframe I want to get the mean and sd of all the numeric columns. So I am using the following block of code:

for (col in colnames(cereal.data)) {
  if (is.numeric(cereal.data$col)) {
    mean(cereal.data$col)
  }
}

But this does not seem to work so I tried the it like

columns <- colnames(cereal.data)
for (col in 1:length(columns)) {
  if (is.numeric(cereal.data[[col]])) {
    mean(cereal.data[[col]])
  }
}

How can I make this work?

I am expecting to have the mean and sd of each numeric columns of the dataframe and tried using for loop to iterate over the column names and checking is they are numeric.

Amit Sur
  • 11
  • 3
  • Here are couple of posts that can help you https://stackoverflow.com/questions/26082889/calculate-summary-statistics-e-g-mean-on-all-numeric-columns-using-data-table and https://stackoverflow.com/questions/40596947/r-for-loops-column-means – Ronak Shah Jul 30 '23 at 05:18

2 Answers2

1

There are a few issues with your code:

When you use colnames(cereal.data), it returns the names of the columns as a character vector. When you iterate over this vector using for (col in 1:length(columns)), the variable col will be the index of the column name, not the actual column name. To fix this, you can directly iterate over the column names using for (col in colnames(cereal.data)).

When accessing the column inside the loop using cereal.data[[col]], you are using the double square brackets [[ ]], which is used to access elements by name or position, not by column index. To access the columns by name, you should use the single square brackets [ ] instead.

Try the following:

calculate_mean_and_sd <- function(df) {
  numeric_cols <- colnames(df)[sapply(df, is.numeric)]
  
  for (col in numeric_cols) {
    column_mean <- mean(df[[col]])
    column_sd <- sd(df[[col]])
    
    cat("Column:", col, "\n")
    cat("Mean:", column_mean, "\n")
    cat("Standard Deviation:", column_sd, "\n\n")
  }
}
  • Your function returns nothing. It shows mean and sd for each column, but these information cannot be stored in an object. You could run `res <- calculate_mean_and_sd(iris)`. When you call `res`, it shows `NULL`. – Darren Tsai Jul 30 '23 at 08:05
  • This function does not have an explicit return statement, so it does not return any value in the traditional sense. Instead, it uses the cat function to print the results directly to the console as it calculates the mean and standard deviation for each numeric column. If you need to store the results for further processing or analysis, you can modify the function to return the results as a list or a data frame. – Beulah Evanjalin Jul 30 '23 at 08:45
  • Use `result[[col]] <- list(mean = column_mean, sd = column_sd)` to store the calculated mean and standard deviation for each numeric column in the result list and return the same. – Beulah Evanjalin Jul 30 '23 at 08:49
  • 2
    Yes I know these! You could update the comments to your answer. In general cases, we not only need to print these information in the console, but need these values to do further analysis. So a returnable function is preferable in this case. – Darren Tsai Jul 30 '23 at 08:58
1

Here is a solution without for loops.

cereal.data <- iris   # test data set

i_cols <- sapply(cereal.data, is.numeric)

colMeans(cereal.data[i_cols])
#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#>     5.843333     3.057333     3.758000     1.199333
apply(cereal.data[i_cols], 2, sd)
#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#>    0.8280661    0.4358663    1.7652982    0.7622377

Created on 2023-07-30 with reprex v2.0.2


Edit

The code above as a function returning a list with members colmeans and colsds. It has an argument na.rm whose default is FALSE.

mean_sd <- function(x, na.rm = FALSE) {
  i_cols <- sapply(cereal.data, is.numeric)
  colmeans <- colMeans(cereal.data[i_cols], na.rm = na.rm)
  colsds <- apply(cereal.data[i_cols], 2, sd, na.rm = na.rm)
  list(colmeans = colmeans, colsds = colsds)
}

mean_sd(cereal.data)
#> $colmeans
#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#>     5.843333     3.057333     3.758000     1.199333 
#> 
#> $colsds
#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
#>    0.8280661    0.4358663    1.7652982    0.7622377

Created on 2023-07-30 with reprex v2.0.2

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66