0

So, Im creating a loop in R that reads through multiple csv files in a directory called "specdata", and afterwards, tells you the mean of a particular colum in common inside those files. This function is represented in the next parragraph the arguments you specify are the directory in which those files are located, the colum you want means to be calculated, and id sequence, that tells you how many files do you want to read depending of de object number represented throudh subsetting []

HERE IS THE FUNCTION:

pollutantmean <- function(directory,pollutant,id) {
     for (i in id) {archivo <- list.files(directory)[i]
    file(archivo[i])
    datapollution <- read.csv(archivo[i],header = TRUE)
    datamatrix <- data.matrix(datapollution)
    mean(datamatrix[pollutant],na.rm = TRUE)}}

the problem is that when the function is called:

pollutantmean("specdata",sulfurate,1:15)

it gives the following error message:

 Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
 Show Traceback

 Rerun with Debug
 Error in file(file, "rt") : cannot open the connection

The interesting part is that the error does not occur when you call the part of the function that gives the error indepently of the function, like this:

file(list.files("specdata")[2])

in this case, It gives the desired conection, later when you apply read.csv("specdata")[2] it works perfectly also.

So here is my question, what Im I missing? It should be conecting and reading all the files the same way it does when the subsetting is on [2] , but replacing the number 2 with the respective i, looping through the function and making me happy. Why does it give an error here but not when subsetting on 2 is executed?

I kind of read somewhere that I have to use Rbind, but either way that would be after generating the conection and reading the files listed, I need to solve this first warning message before that ( not sure how I would do it afterwards...).

Yep, im from coursera, sorry to be a cliche, but im a really nice guy PLEASE HELP :)

brandata
  • 81
  • 9
  • Possible duplicate of [Importing multiple .csv files into R](https://stackoverflow.com/questions/11433432/importing-multiple-csv-files-into-r) – patL May 30 '18 at 09:45

2 Answers2

0
files <- list.files(directory, full.names = TRUE, pattern = ".csv") # be sure your working directoy contains this data

pollutantmean <- function(directory, pollutant, id) {
  for (i in id) {
    datapollution <- read.csv(files[i], header = TRUE, stringsAsFactors = FALSE)
    datamatrix <- data.matrix(datapollution)
    mean(datamatrix[pollutant],na.rm = TRUE)}
}


pollutantmean("specdata",sulfurate,1:15)
Mislav
  • 1,533
  • 16
  • 37
  • it's a vector, not a list. It's faster, nothing else.I I also added full.names = TRUE option. Does it work? – Mislav May 30 '18 at 10:43
  • Yes it did! thanks man, I would like to understand though: What does the full.names = TRUE argument on list.files function actually do? Why is no file() function needed? is the conection generated atuomatically with list.files()? – brandata May 30 '18 at 11:07
  • `full.names==TRUE` gives the full (not partial) path.I don't know why did you open empty file. `read.csv` needs only path of the csv file. Can you accept the answer? – Mislav May 30 '18 at 12:33
0

so it worked, just adding full.names = TRUE, eliminating the files function, and elimating i on the subsetting of list.files did the trick on solving that problem.

  function(directory,pollutant,id) {
  for (i in id) {archivo  <-  list.files(directory,full.names = TRUE)
  datapollution <- read.csv(archivo[i],header = TRUE)
  datamatrix <- data.matrix(datapollution)
  resultmean <- mean(datamatrix[pollutant],na.rm = TRUE)}
  print(resultmean)}

I would like to understand though:

What does the full.names = TRUE argument on list.files function actually do?

Why is no file() function needed? is the conection generated atuomatically with list.files()?

brandata
  • 81
  • 9