I am dealing with a dataset that consists of over 90,000 csv.-files. Each csv.-file shows sampling data of a specific chemical measured at a specific sampling site. The files look like this:
#csv1
chemical_ID samplingsite A result year month
1 1 1 0.5 2008 7
1 1 1 0.5 2008 5
1 1 1 0.5 2008 1
1 1 1 0.3 2008 11
1 1 1 0.5 2010 6
1 1 1 0.4 2010 10
1 1 1 0.5 2010 2
1 1 1 0.5 2010 4
1 1 1 0.4 2013 3
1 1 0 0.2 2013 5
1 1 0 0.1 2013 7
1 1 1 0.5 2013 9
1 1 1 0.4 2014 3
1 1 0 0.2 2014 5
1 1 0 0.1 2014 7
1 1 1 0.5 2014 9
#csv2
chemical_ID samplingsite A result year month
2 1 1 0.8 2008 6
2 1 1 0.7 2008 9
2 1 1 0.9 2008 11
2 1 1 0.6 2008 12
2 1 1 0.5 2010 2
2 1 1 0.4 2010 5
2 1 1 0.8 2010 6
2 1 1 0.9 2010 8
#csv3
chemical_ID samplingsite A result year month
100 2 1 1.5 2001 1
100 2 1 1.2 2001 6
100 2 1 1.7 2002 1
100 2 1 0.9 2002 6
100 2 1 1.8 2003 1
100 2 0 1.4 2003 6
100 2 1 1.5 2004 1
100 2 0 1.2 2004 6
To reduce the amount of files I would like to select only the files that match specific criteria and save them in a new folder. Criteria for each chemical shall be:
Number of sampled years > 4
Number of samplings per year >= 4
Number of factor “1” in column “A” per year >= 4
I’ve tried but can’t find a solution for my task and google wasn’t helpful at all. This is what I’ve got so far:
{
mycsv=list.files(path="D:/…/in ", pattern="allyears")
n <- length(mycsv)
mylist <- vector("list", n)
for(i in 1:n)
mylist[[i]] <- read.csv(mycsv[i], header = TRUE)
mylist <- lapply(mylist, FUN=function(x) length(unique(x$year)))
#???
for(i in 1:n)
write.csv(file = paste("D:/…/out", mycsv[i], sep = ""),
mylist[i], row.names = F)
}
Thanks in advance
Nis