mean() fails when using [ form of extract operator

Question

I have the following function defined in R 4.0.2:

pollutantmean<- function(pollutant, id=001:332){
    library(stringr)
    newid<-str_pad(id, 3, pad = "0")
    data<-read.csv(paste(newid, ".csv", sep=""))
    if(pollutant == "sulfate"){
    pollnum <- 2
    }
    if(pollutant == "nitrate"){
    pollnum <- 3
    }
    mean(data[pollnum], na.rm = TRUE)
}

If, in my last line, I just call data[pollnum], I get the desired printout of the column I'm looking for. I have found this question, but I am duplicating that syntax exactly, and still getting a result of

Warning message:
In mean.default(data[pollnum], is.na = TRUE) :
  argument is not numeric or logical: returning NA

What am I doing wrong?

Beautiful, thank you! Post as answer, if you want. I'll accept it. — The Count, Jul 27 '20 at 17:25
Since this relates to the JHU Data Science Specialization, see also [Forms of the Extract Operator](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-extractOperator.md). — Len Greski, Jul 27 '20 at 17:39
BTW, I posted another answer that shows how to use the `pollutant` argument directly in the extract operator. — Len Greski, Jul 27 '20 at 18:12
Also, to solve the assignment you'll need to add code that reads multiple sensor files and combines them into a single data frame. I discuss this in [Breaking Down pollutantmean()](https://github.com/lgreski/datasciencectacontent/blob/master/markdown/rprog-discussPollutantmean.md). — Len Greski, Jul 27 '20 at 18:24

score 3 · Accepted Answer · answered Jul 27 '20 at 17:25

3

mean requires a vector as input as mentioned in the ?mean

x - An R object. Currently there are methods for numeric/logical vectors and date, date-time and time interval objects. Complex vectors are allowed for trim = 0, only.

and data[pollnum] is a data.frame wth single column. So, we can extract the column as a vector with [[

...
  mean(data[[pollnum]], na.rm = TRUE)
...

answered Jul 27 '20 at 17:25

akrun

874,273
37
540
662

1

Thanks very much. I am a new learner, as you can probably tell. – The Count Jul 27 '20 at 17:28

Len Greski · Answer 2 · 2020-07-27T18:25:43.457

In addition to @akrun's answer, one can also use the pollutant argument in the pollutantmean() function directly within the extract operator. This avoids the need for conditional logic to assign a column number that was included in the original question.

We'll use the first 20 non-missing observations from sensor 001 for the pollutantmean() assignment, and illustrate multiple forms of the extract operator.

data <- structure(list(Date = c("2003-10-06", "2003-10-12", "2003-10-18", 
                                "2003-10-24", "2003-10-30", "2003-11-11", "2003-11-17", "2003-11-23", 
                                "2003-11-29", "2003-12-05", "2003-12-11", "2003-12-23", "2003-12-29", 
                                "2004-01-04", "2004-01-10", "2004-01-22", "2004-01-28", "2004-02-03", 
                                "2004-02-09", "2004-02-21"), sulfate = c(7.21, 5.99, 4.68, 3.47, 
                                                                         2.42, 1.43, 2.76, 3.41, 1.3, 3.15, 2.87, 2.27, 2.33, 1.84, 7.13, 
                                                                         2.05, 2.05, 2.58, 3.26, 3.54), nitrate = c(0.651, 0.428, 1.04, 
                                                                                                                    0.363, 0.507, 0.474, 0.425, 0.964, 0.491, 0.669, 0.4, 0.715, 
                                                                                                                    0.554, 0.803, 0.518, 1.4, 0.979, 0.632, 0.506, 0.671), ID = c(1L, 
                                                                                                                                                                                  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
                                                                                                                                                                                  1L, 1L, 1L)), row.names = c(279L, 285L, 291L, 297L, 303L, 315L, 
                                                                                                                                                                                                              321L, 327L, 333L, 339L, 345L, 357L, 363L, 369L, 375L, 387L, 393L, 
                                                                                                                                                                                                              399L, 405L, 417L), class = "data.frame")

mean(data[["sulfate"]],na.rm=TRUE)
mean(data[,"nitrate"],na.rm=TRUE)

...and the output:

> mean(data[["sulfate"]],na.rm=TRUE)
[1] 3.287
> mean(data[,"nitrate"],na.rm=TRUE)
[1] 0.6595
>

Applying this approach to the extract operator within the pollutantmean() function, the code would look like this:

pollutantmean <- function(directory,pollutant, id=001:332){
   # read the files, given sensor IDs
   data <- # code goes here
      
   mean(data[[pollutant]],na.rm = TRUE)
 
}

mean() fails when using [ form of extract operator

2 Answers2