3

I feel this should be something easy, I have looked x the internet, but I keep getting error messages. I have done plenty of analytics in the past but am new to R and programming.

I have a pretty basic function to calculate means x columns of data:

columnmean <-function(y){
  nc <- ncol(y)
  means <- numeric(nc)
  for(i in 1:nc) {
    means[i] <- mean(y[,i])
  }
    means 
}

I'm in RStudio and testing it using the included 'airquality' dataset. When I load the AQ dataset and run my function:

data("airquality")
columnmean(airquality)

I get back:

NA NA 9.957516 77.882353 6.993464 15.803922

Because the first two variables in AQ have NAs in them. K, cool. I want to suppress the NAs such that R will ignore them and run the function anyway.

I am reading that I can specify this with na.rm=TRUE, like:

columnmean(airquality, na.rm = TRUE)

But when I do this, I get an error message saying:

"Error in columnmean(airquality, na.rm = TRUE) : unused argument (na.rm = TRUE)"

I'm reading all over the place that I simply need to include na.rm = TRUE and the function will run and ignore the NA values...but I keep getting this error. I have also tried use = "complete" and anything else I can find.

Two Caveats:

I know I can create a vector with is.na and then subset the data, but I don't want that extra step, I just want it to run the function and ignore the missing data.

I know also I can specify IN the function to ignore or not ignore, but I'd like a way to choose to ignore/not ignore on the fly, on a action by action basis, rather than having it be part of the function itself.

Help is appreciated. Thank you, everyone.

Adam_S
  • 687
  • 2
  • 12
  • 24

3 Answers3

3

We can include the na.rm = TRUE in mean

columnmean <-function(y){
  nc <- ncol(y)
  means <- numeric(nc)
  for(i in 1:nc) {
    means[i] <- mean(y[,i], na.rm = TRUE)
  }
   means 
}

If we need to use na.rm argument sometimes as FALSE and other times as TRUE, then specify that in the argument of 'columnmean'

columnmean <-function(y, ...){
    nc <- ncol(y)
  means <- numeric(nc)
   for(i in 1:nc) {
     means[i] <- mean(y[,i], ...)
   }
   means 
  }

columnmean(df1, na.rm = TRUE)
#[1] 1.5000000 0.3333333
 columnmean(df1, na.rm = FALSE)
#[1] 1.5  NA

data

 df1 <- structure(list(num = c(1L, 1L, 2L, 2L), x1 = c(1L, NA, 0L, 0L
 )), .Names = c("num", "x1"), row.names = c(NA, -4L), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thank you for your answer. I read about that option, but ideally I want to be able to specify within the call whether to ignore or not...sometimes I want to ignore the NAs, other times I do not. This is not an option? – Adam_S Apr 05 '17 at 17:15
  • @Adam_S Then you could make that in the function argument wth 3 dots `...` and also in the `mean` – akrun Apr 05 '17 at 17:18
  • 1
    Yes, perfect. Thank you so much for taking time to answer a pretty basic question! – Adam_S Apr 05 '17 at 17:27
2

You should be using that parameter in the mean function call:

columnmean <-function(y){
  nc <- ncol(y)
  means <- numeric(nc)
  for(i in 1:nc) {
    means[i] <- mean(y[,i], na.rm = TRUE)
  }
    means 
}

columnmean is a custom function and does not have that parameter.

Vince
  • 3,325
  • 2
  • 23
  • 41
  • 1
    Thank you for your answer. Another way to say what you are saying...because it's a function I wrote, I cannot specify ignore NAs Y/N when I call it, I have to specify that when I write the function? Sometimes I want to ignore the NAs, other times I do not. This is not an option? – Adam_S Apr 05 '17 at 17:16
0

You can pass the parameter na.rm to your function:

columnmean <- function(y, na.rm = FALSE){
  nc <- ncol(y)
  means <- numeric(nc)
  for(i in 1:nc) {
    means[i] <- mean(y[,i], na.rm = na.rm)
  }
  means 
}

data("airquality")
columnmean(airquality, na.rm = TRUE)
#[1] 42.129310 185.931507   9.957516  77.882353   6.993464  15.803922

columnmean(airquality)
#[1]        NA        NA  9.957516 77.882353  6.993464 15.803922

But my recommendation is to look for an alternate code to loops:

column_mean <- function(y, na.rm = FALSE) {
  sapply(y, function(x) mean(x, na.rm = na.rm))
}

column_mean(airquality, na.rm = TRUE)
#     Ozone    Solar.R       Wind       Temp      Month        Day 
# 42.129310 185.931507   9.957516  77.882353   6.993464  15.803922
Enrique Pérez Herrero
  • 3,699
  • 2
  • 32
  • 33