-1

I have a dataframe named 'directory' which has 4 columns namely a,b,c,d. I need to find the mean of either column b or column c based on the input.

Both the columns b and c have NAs and numeric values.

TotalMean<- function(directory, pollutant = "b", id = 1:10)
{
    mean(subset(directory, ID=  id, select = directory[[pollutant]]), na.rm = TRUE)
}

TotalMean<- function(directory, pollutant = "b", id = 1:10)
{
    mean(subset(directory, ID=  id, select = directory$pollutant), na.rm = TRUE)
}

TotalMean<- function(directory, pollutant = "b", id = 1:10)
{
    mean(subset(directory, ID=  id, select = directory[,pollutant]), na.rm = TRUE)
}

I've tried all the above mentioned functions. However it gives me the following error.

Since I'm new to R programming I'm not sure why this is happening. Any help would be appreciated.

Thanks in advance

sreelekha
  • 53
  • 5
  • Sorry forgot to mention the error. The following is the error. Error in `[.data.frame`(directory, , directory$b) : undefined columns selected – sreelekha Nov 13 '19 at 07:35
  • Does `colMeans` is not appropriate for your question ? Can you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of your dataset ? – dc37 Nov 13 '19 at 07:40
  • I get the same error using colMeans as well – sreelekha Nov 13 '19 at 07:42
  • can you provide the code you're using for calculating `colMeans` ? (and the reproducible example ;) ) – dc37 Nov 13 '19 at 07:43
  • You are using `subset` wrongly. You simply cannot use it like that (i.e., programmatically passing variables to its `subset` and `select` parameters). Read the warning in `help("subset")`. And then use subsetting as detailed in `help("[")`. – Roland Nov 13 '19 at 07:44
  • Also, inform yourself about the difference between `=` and `==`. – Roland Nov 13 '19 at 07:46
  • I do not understand. Could you please elaborate? – sreelekha Nov 13 '19 at 07:49
  • Also could you please let me know the alternative in that case? – sreelekha Nov 13 '19 at 07:50

1 Answers1

0

You do not need subset, you can simply do the following

TotalMean <- function(directory, pollutant = "b", id = 1:10) {
    mean(directory[id, pollutant], na.rm=TRUE)
}

directory <- data.frame("a" = c(1,NA,2), "b" = c(NaN,2,3))
print(TotalMean(directory,"a"))
print(TotalMean(directory,"b"))
T.C. Helsloot
  • 151
  • 1
  • 7
  • Even with na.rm=TRUE? What data enters the mean function? – T.C. Helsloot Nov 13 '19 at 08:58
  • Yes. The data is numeric or NAs. Mostly floating point numbers. – sreelekha Nov 14 '19 at 07:40
  • @sreelekha I edited my answer to show that it works with NaN's and NA. The only reason I can think of why it returns NaN is if you calculate the mean of only NA's. Unless you provide a minimal working example showing what exactly doesn't work I can't help you further. Good luck! – T.C. Helsloot Nov 14 '19 at 09:15