0

I am trying to calculate the mean (average) for all columns in my dataframe. I have created this code snippet

#Average overallDataset by label
overallDatasetLabels <- c("label","index","nr_pix","rows_with_2","cols_with_2","rows_with_3p","cols_with_3p","height","width","left2tile","right2tile","verticalness","top2tile","bottom2tile","horizontalness","nodiagnols")
library(dplyr)
avgOverallDataset <- summarise(group_by(overallDataset,label),nr_pix_avg=mean(nr_pix))
for (val in overallDatasetLabels){
  if (val %in% c("label","index","nr_pix")){
    next
  }
  avgOverallDataset<-cbind(avgOverallDataset,summarise(group_by(overallDataset,label),val=mean(val)))
}

When i run this code I get this error/warning:

50: In mean.default(val) : argument is not numeric or logical: returning NA

And the resulting dataframe looks like this:

enter image description here

The reason for this is that the val variable is being treated as a string, but I need it to be treated as "code" e.g

avgOverallDataset<- cbind(avgOverallDataset,summarise(group_by(overallDataset,label),avgrows_with_2=mean(rows_with_2)))

Would be valid.

How do I go about translating the "string" into values "in the code"?

Note: the multiple label columns can be removed using: How to remove duplicated column names in R?

Lyra Orwell
  • 1,048
  • 4
  • 17
  • 46
  • I think you are looking for `mean(!!sym(val))`. See https://adv-r.hadley.nz/metaprogramming.html – Bas Apr 13 '21 at 14:07

1 Answers1

0

Try with across so that you don't have to calculate mean for multiple columns in a loop.

cols <- setdiff(overallDatasetLabels, c("label","index","nr_pix"))

avgOverallDataset <- overallDataset %>%
                      group_by(label) %>%
                      summarise(across(all_of(cols), mean, na.rm = TRUE))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213