0

I'm pretty close to finishing this R program, but the result keeps giving me NaN. It's supposed to find the nitrate or sulfate mean across a bunch of csv files. Would anyone know where the code might be going wrong? Below is the program description. It seems pretty self explanatory, it's just I'm somewhat stumped. If you need anymore details please let me know. Thanks

pollutantmean <- function(directory, pollutant, id = 1:332) {
        ## 'directory' is a character vector of length 1 indicating
        ## the location of the CSV files

        ## 'pollutant' is a character vector of length 1 indicating
        ## the name of the pollutant for which we will calculate the
        ## mean; either "sulfate" or "nitrate".

        ## 'id' is an integer vector indicating the monitor ID numbers
        ## to be used

        ## Return the mean of the pollutant across all monitors list
        ## in the 'id' vector (ignoring NA values)
}

pollutantmean = function(directory, pollutant, id = 1:332) {
            files_polm = list.files(directory, full.names = TRUE)
            dat_3 = numeric()
            for (x in id) {
                    dat_3 = rbind(dat_3, read.csv(files_polm[x]))
            }
            if (pollutant == "sulfate") {
                    sub_pol = dat_3[which(dat_3[, "sulfate"] == "sulfate"), ]
                    mean(sub_pol[, "sulfate"], na.rm = TRUE)
            }
            else if (pollutant == "nitrate") {
                    sub_pol = dat_3[which(dat_3[, "nitrate"] == "nitrate"), ]
                    mean(sub_pol[, "nitrate"], na.rm = TRUE)
            }
            else {
                    print("Try Again")
            }
    }
  • Could you please make this a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250)? That would make it easier for us to try it out and find out what is wrong with it. – Christopher Bottoms Mar 12 '15 at 19:33
  • 2
    it would be easier to debug if you got each step working and _then_ put it into a function – rawr Mar 12 '15 at 19:46
  • Describing in words what you want would also help. Do you really need to look in a column called "sulfate" to see find rows where the value is "sulfate"? Impossible to say without seeing your data. – Gregor Thomas Mar 12 '15 at 19:53
  • I edited my the original post above with the explanations. My apologies for not giving more detail. Thanks. – Alvin van der Kuech Mar 12 '15 at 21:06
  • 1
    you have an empty group somewhere. `mean(numeric(0))` is `NaN` ... – Ben Bolker Mar 12 '15 at 21:10
  • [Similar question](http://stackoverflow.com/questions/23640594/reading-multiple-files-and-calculating-mean-based-on-user-input) is worth a read – tospig Mar 12 '15 at 21:41
  • What do you think? @ChristopherBottoms I would greatly appreciate your further input. Thanks again. – Alvin van der Kuech Mar 12 '15 at 22:04
  • Like @rawr said above, please take this apart and get each piece working. For example, calculate the average sulfate value for one file. Once you get that working, then do it for sulfate and nitrate. Then once that is working, then see if you can get a list of all the files in a directory. ... etc. Once you get one step working, be sure to save a copy of your code so that you can start over from that point if needed. – Christopher Bottoms Mar 13 '15 at 16:07
  • @ChristopherBottoms Thanks I'm into it. – Alvin van der Kuech Mar 13 '15 at 18:20

1 Answers1

0

I edited your code, assuming that within each .csv file your "nitrate" or "sulafte" column contains numerical or integer data type, i.e. the amount/concentration of each substance.

I also modified the for loop to be more coherent with your .csv files structure. Here is the code, hope it works - if not, please edit to indluce the output of str() function of one of your .csv files

pollutantmean = function(directory, pollutant, id = 1:332) {
 files_polm = list.files(directory, full.names = TRUE)
 dat_3 = numeric()
 for (x in id) {
   if (x==id[1]) {
     dat_3 = read.csv(files_polm[x])
   } else{
     dat_3 = rbind(dat_3, read.csv(files_polm[x])) 
   }

 }
 if (pollutant == "sulfate") {
    mean(sub_pol[, "sulfate"], na.rm = TRUE)
 } else if (pollutant == "nitrate") {
    mean(sub_pol[, "nitrate"], na.rm = TRUE)
 } else {
    print("Try Again")
 }
}
dof1985
  • 152
  • 1
  • 8
  • Thanks much. I'm analyzing the code against what I did. – Alvin van der Kuech Mar 13 '15 at 11:53
  • Still getting the NaNs. From the code that you gave me I pretty much just defined 'sub_pol' again, so that I can get a subset of 'sulfate' or 'nitrate' – Alvin van der Kuech Mar 13 '15 at 12:04
  • @AlvinvanderKuech, so you managed to fix it? If not, some sample data, or the output of the `str()` function on your dat_3 variable would help to modify your code, or understand what goes wrong – dof1985 Mar 13 '15 at 13:47
  • Thanks, I'm going to go through it piece by piece. It's the best way to see what's not working. That's something I need to learn, so might as well jump in. – Alvin van der Kuech Mar 13 '15 at 18:20