0

I am currently doing a project using R which asks me to make a function which generates the mean of a certain column in a data frame. In order to do this I need to tell the function which column to calculate the mean of, and I want to do this by passing the column name as an argument. My function right now looks like this:

## The three arguments are id, directory, and pollutant which will contain the column name
 pollutantmean <- function(directory = './specdata/', pollutant =     "nitrate",     id = 1:332){

  mylist <- list.files(directory)

  f = data.frame()

  f <- do.call(rbind, lapply(paste(directory, sprintf("%03d", id), '.csv', sep = ""), read.csv))


## Right now I am using two separate if statements, one for each possible pollutant,to get the desired result
  if(pollutant == "nitrate"){

    ans <- mean(f$nitrate[!is.na(f$nitrate)])
  }

  else if(pollutant == "sulfate"){

    ans <- mean(f$sulfate[!is.na(f$sulfate)])
  }
  print(ans)
}

Right now I am using if statements to get my desired result, and it seems to be working fine. However I am concerned that this would not be scale-able. This works because there were only two pollutants, but what if there were two thousand? I couldn't exactly make an if statement for each. I would really like it if the code were a little more elegant. I was trying to make the mean calculation look like this,

ans <- mean(f$pollutant[!is.na(f$pollutant)])

hoping that the argument pollutant would be passed directly to the subset argument. Instead I get this warning message:

Warning messages:
1: In is.na(f$pollutant) :
  is.na() applied to non-(list or vector) of type 'NULL'
2: In mean.default(f$pollutant[!is.na(f$pollutant)]) :
  argument is not numeric or logical: returning NA

I am wondering if there is a way that I can get rid of the two if statements that I have and use just a single command to get the desired result. Any help is much appreciated, thank you in advance!

embryo3699
  • 55
  • 7

0 Answers0