0

So, Im creating a loop in R that reads through multiple csv files in a directory called "specdata", and afterwards, tells you the mean of a particular colum in common inside those files. This function is represented in the next parragraph the arguments you specify are the directory in which those files are located, the colum you want means to be calculated, and id sequence, that tells you how many files do you want to read depending of de object number represented throudh subsetting []

I made a querie about this function before, and it was solved, now, it works, and gives a result. But it gives an incorrect one, it gives NA or NAN always, when it should give a number.

    pollutantmean <- function(directory,pollutant,id) {
  for (i in id) {archivo  <-  list.files(directory,full.names = TRUE)
  datapollution <- rbind(read.csv(archivo[i],header = TRUE))
  datamatrix <- data.matrix(datapollution)
  resultmean <- mean(datamatrix[pollutant],na.rm = TRUE)}
  print(resultmean)}

why is it not working? my theory is that im aplying rbind incorrectly.

brandata
  • 81
  • 9

1 Answers1

0

It's difficult to provide more specific help due to the lack of sample data/code, but I see a couple of issues with your code.

  1. There is no need to repeatedly list.file inside the for loop.
  2. In fact, there is no need for a for loop here, and it will be faster to do something like

    archive <- list.files(directory, full.names = TRUE)
    datapollution <- do.call(rbind, lapply(archive, read.csv))
    

PS. For maximum help here on SO, it's always best to provide a minimal & reproducible example including sample data.

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Ok so here is an example of aplication pollutantmean("specdata","sulfate",1:10) – brandata Jun 01 '18 at 07:27
  • How can I add sample data? im new to stack so I dont know how to insert a file here :( – brandata Jun 01 '18 at 07:29
  • now, im going to ask you about do.call(rbind , llapply ( archive, read.csv)) basically with lapply you are telling R to apply read.csv to archivo = the list files of files in directory. And with Rbind you bind the rows of the resulting data frames. You apply both at the same time using do. call this is correct? I understood correctly? – brandata Jun 01 '18 at 07:34
  • about the loop, I understand why you think is not necesary, I miss explained and forgot to add : the idea is to be able to calculate the means of only the files we wish, by being able to subset with the id argument. That way, if the total list of files in "specdata" directory is composed of 332 files, with id im able to put as argument 1:10 for example, and only calculate the means from the first to the tenth file, representing the i in the for loop. Thats why the loop is needed, to control how many files inside the "specdata" directory we want to read. – brandata Jun 01 '18 at 07:38
  • @brandata (1) Read the link I gave on reprex's. There is plenty of advice given on how to share data (use e.g. `dput`). Also, when you signed up on SO, you would've been advised to [take the tour](https://stackoverflow.com/tour). I recommend you do so; the SO community has certain expectations regarding the quality of posts. (2) Concerning `do.call(rbind, ...)`: Yes, your interpretation is correct. [...] – Maurits Evers Jun 01 '18 at 07:54
  • [continued] (3) I don't understand your last comment. Again, please provide representative example code with data, where you show what your expected output should be. Create some fake data if necessary. It will be much easier to help with a specific example. – Maurits Evers Jun 01 '18 at 07:54