I am currently doing a project using R which asks me to make a function which generates the mean of a certain column in a data frame. In order to do this I need to tell the function which column to calculate the mean of, and I want to do this by passing the column name as an argument. My function right now looks like this:
## The three arguments are id, directory, and pollutant which will contain the column name
pollutantmean <- function(directory = './specdata/', pollutant = "nitrate", id = 1:332){
mylist <- list.files(directory)
f = data.frame()
f <- do.call(rbind, lapply(paste(directory, sprintf("%03d", id), '.csv', sep = ""), read.csv))
## Right now I am using two separate if statements, one for each possible pollutant,to get the desired result
if(pollutant == "nitrate"){
ans <- mean(f$nitrate[!is.na(f$nitrate)])
}
else if(pollutant == "sulfate"){
ans <- mean(f$sulfate[!is.na(f$sulfate)])
}
print(ans)
}
Right now I am using if statements to get my desired result, and it seems to be working fine. However I am concerned that this would not be scale-able. This works because there were only two pollutants, but what if there were two thousand? I couldn't exactly make an if statement for each. I would really like it if the code were a little more elegant. I was trying to make the mean calculation look like this,
ans <- mean(f$pollutant[!is.na(f$pollutant)])
hoping that the argument pollutant would be passed directly to the subset argument. Instead I get this warning message:
Warning messages:
1: In is.na(f$pollutant) :
is.na() applied to non-(list or vector) of type 'NULL'
2: In mean.default(f$pollutant[!is.na(f$pollutant)]) :
argument is not numeric or logical: returning NA
I am wondering if there is a way that I can get rid of the two if statements that I have and use just a single command to get the desired result. Any help is much appreciated, thank you in advance!