i have a set of CSV files. Each CSV file has a unique ID on it, and other columns like "date", "sulfate", "nitrate". This is data about air pollution.
The function must use 3 arguments: "directory", "pollutant", "id".
This is the original data format (for the 001.csv file):
Date Sulfate Nitrate ID
2013-02-04 2.27 NA 1
2013-02-05 NA 1.15 1
This is my function so far:
pollutantmean <- function (directory, pollutant, id = 1:332){
files_full <- list.files (directory, full.names = TRUE)
dat <- data.frame ()
for (i in id){
dat <- rbind (dat, files_full[i])
}
datasub <- dat[,pollutant]
}
1) When users enter this: pollutantmean("specdata", "nitrate", 70:72)
They should get (DESIRE OUTPUT):
1.706
Instead i get:
Error in `[.data.frame`(dat, , pollutant) : undefined columns selected
In addition: Warning messages:
1: In `[<-.factor`(`*tmp*`, ri, value = "specdata/071.csv") :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, ri, value = "specdata/072.csv") :
invalid factor level, NA generated
What do these errors mean?