I have 100 csv files, and I intent to pick and calculate sum of data present in sulfate/nitrate columns as mentioned below.
The CSV format is:
Date sulfate nitrate ID
1/1/2003 NA NA 1
1/2/2003 NA NA 1
1/3/2003 NA NA 1
1/4/2003 NA NA 1
1/5/2003 NA NA 1
1/6/2003 NA NA 1
1/7/2003 NA NA 1
1/8/2003 NA NA 1
1/9/2003 NA NA 1
1/10/2003 NA NA 1
1/11/2003 NA NA 1
1/12/2003 NA NA 1
1/13/2003 NA NA 1
1/14/2003 NA NA 1
1/15/2003 NA NA 1
1/16/2003 NA NA 1
1/17/2003 NA NA 1
1/18/2003 NA NA 1
1/19/2003 NA NA 1
All of the 100 files are in a folder and have name 001.csv,002.csv...100.csv
The ID over here is the name of the csv file. All the 100 files are with the above mentioned format.
Here is the code that I have written so far:
pollutantmean <- function(directory,pollutant,id = 1:332)
{
test<- c('sulfate','nitrate')
for(i in seq_along(id))
{
j<-formatC(i, width=3, flag="0")
temp<-"C:/Users/Himanshu/Downloads/rprog-data-specdata/"
temp1<-paste(temp,directory,sep="")
filepath<- file.path(temp1,paste(j,".csv",sep=""))
if(test[1]==pollutant)
{
data<-read.csv(filepath,header = TRUE, sep = "\t",colClasses=c(NA,"sulfate",NA,NA))
sum(x=data,na.rm=FALSE)
}
else if(test[2]==pollutant)
{
data<-read.csv(filepath,header = TRUE, sep = "\t",colClasses=c(NA,NA,"nitrate",NA))
sum(x=data,na.rm=FALSE)
}
data
}
}
I got below error on executing the statement on R studio's command prompt-
data<-read.csv(filepath,header = TRUE, sep = "\t")[,c('nitrate')]
Error --
Error in `[.data.frame`(read.csv(filepath, header = TRUE, sep = "\t"), :
undefined columns selected
Another way I tried was -
data<-read.csv(filepath,header = TRUE, sep = "\t",colClasses=c(NA,"sulfate",NA,NA))
Error in this case was --
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
cols = 1 != length(data) = 4
This is what user will put in R's command prompt -
pollutantmean("specdata", "nitrate", 1:72)
Here first argument is the directory reference, 2nd argument is the column name reference and 3rd argument is the number of CSV files to be picked up.