0

I am having trouble figuring out for loops in R after learning in Python for a while. What I want to do is pull out $nitrate or $Sulfate from the vector of CSV files this code returns:

getpollutant <- function(id=1:332, directory, pollutant) {
        data<-c()
        for (i in id) {
                data[i]<- c(paste(directory, "/",formatC(i, width=3, flag=0),".csv",sep=""))     
        }
        df<-c()
        for (d in 1:length(data)){ df[[d]]<-c(read.csv(data[d]))

        }
        df               
}

I haven't included the for loop for pollutant yet, I've tried many different approaches but can't get it to work quite right... with the code above I can put in: getpollutant(1:10, "specdata") and it will give me all the csv files from the specdata directory with labels 001 through 010, it spits out each csv file in separated chunks with headers of the format [[i]]$columnname with the contents of the column listed below. What I want to do is pull out a specific columnname (pollutant) and return the contents of that column from every csv file. I have read through the help pages and just can't seem to get my formatting right...

@RomanLuštrik I don't know if this is what you're looking for but here's a sample output if I put in

getpollutant(1, "specdata"):             
[[1]]                                                                    
[[1]]$Date                                                             
[1] 2003-01-01 2003-01-02 2003-01-03                                     
[[1]]$sulfate                                                          
[1] NA NA NA NA NA NA 7.210 NA NA NA 1.300                           
[[1]]$nitrate                                                          
[1] NA NA NA .474 NA NA NA .964 NA NA NA         

obviously this is a very small version of what the output is but basically it takes the CSV files in the specified range id and prints them out like this...

  • 1
    Is this for the Coursera course "R programming"? – Jaap May 19 '14 at 18:09
  • Can you give a small, reproducible example? – Roman Luštrik May 19 '14 at 18:19
  • @Jaap Yeah, I realize I'm behind from last week. Just trying to understand before I move on.... – user3653647 May 19 '14 at 18:19
  • Welcome to StackOverflow. Please read the info about how to [ask a question](http://stackoverflow.com/help/how-to-ask) and how to produce a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). You might also want to read the [open letter to students with homework problems](http://meta.programmers.stackexchange.com/questions/6166/open-letter-to-students-with-homework-problems) – Jaap May 19 '14 at 18:38
  • You can also search on this site for "[r] pollutant" to see all the other questions people have asked/answered for this homework assignment. – MrFlick May 19 '14 at 18:49

1 Answers1

1

Do you only want to read in a certain column from the files? and do you know which column it is by number (e.g. the 3rd column)? In that case you can use the colClasses argument to read.table/read.csv to specify only reading in the given column.

If you don't know which column it is ahead of time then you may need to read in the entire file, then only return the given column. In that case you probably want to use [[]] subsetting instead of $ subsetting.

You can also make your code more compact and possibly more efficient by using sprintf and lapply or sapply.

Consider this code:

lapply(1:332, function(id) {
  read.csv( sprint("%s/%03d.csv", directory, id )
})

or

sapply( list.files(directory, pattern='\\.csv$',full.names=TRUE), 
  function(nm) read.csv(nm)[[pollutant]] )
Greg Snow
  • 48,497
  • 6
  • 83
  • 110
  • I appreciate your answer but it's beyond my scope in R I literally just started writing functions and I am trying to understand how for loops work. This language is a lot more cryptic than python so I really want to get to know the basics before I just throw code in that I don't understand. I do know which column I want to read in so I'll go back and look into the colClasses argument and see if I can wrap my head around it – user3653647 May 19 '14 at 18:39
  • @user3653647, ok, put this code on the back burner until you are ready. With `colClasses` you give a vector with the same number of elements as you have columns in the data, any elements of your vector that are `NULL` mean skip the corresponding column in the file. – Greg Snow May 19 '14 at 19:12