Combining files based on their names in R into dataframes

Question

I currently have a vector containing a list of paths leading to files such as:

files <- c("C:/Users/Me/Desktop/cc/canada/2016/Ontario.BRU", 
           "C:/Users/Me/Desktop/cc/canada/2017/Ontario.BRU", 
           "C:/Users/Me/Desktop/cc/canada/2018/Ottawa.BRU",
           "C:/Users/Me/Desktop/cc/canada/2018/Ontario.BRU")

I would like to combine files that are ending by the same city into the same dataframe, one after another. If there is only one occurrence of a city, I would still save the dataframe as a csv file at the end. Here is the code I just started:

cad<-NULL
for(b in 1:length(files)){ 
  country<-sub(".*/ *(.*?) */[[:digit:]].*", "\\1", files[b]) 

  if(country=="canada"){ 
    cad<-c(cad, files[b])
  }
    cad_cities <- unique((sub(".*/ *(.*?) *.BRU.*", "\\1", cad)))
    for(c in 1:length(cad_cities)){
      city<-sub(".*/ *(.*?) *.BRU.*", "\\1", cad)
    }
}

I am stuck after this part. Thank you.

Edit: example of datafiles

2018,1,0,9999,-20.70,-23.00,-22.10,81.00,0.00,000,-991,-991,-991,-2.41,-991,-991,8.90,353,97.36,-991,-991,19.00,-991
2018,1,100,9999,-21.40,-22.70,-22.00,80.00,0.00,100,-991,-991,-991,-2.42,-991,-991,7.80,264,97.36,-991,-991,18.00,-991
2018,1,200,9999,-21.40,-22.50,-21.90,79.00,0.00,200,-991,-991,-991,-2.42,-991,-991,10.30,270,97.34,-991,-991,19.00,-991
2018,1,300,9999,-20.80,-21.90,-21.40,78.00,0.00,300,-991,-991,-991,-2.43,-991,-991,10.70,263,97.32,-991,-991,18.00,-991

Read them all into a list, see [this post](https://stackoverflow.com/questions/11433432). Then [combine them with ID column](https://stackoverflow.com/q/15162197). Then split the dataframe based on values in ID column. — zx8754, Aug 20 '18 at 21:05

Rui Barradas · Answer 1 · 2018-08-20T21:21:17.247

0

Maybe something like the following.(First, run the code in the question.)
Untested, since there are no data files.

for(cad in cad_cities){
    tmp <- grep(cad, files, value = TRUE)
    tmp <- lapply(tmp, read.table, sep = ",")
    tmp <- do.call(rbind, tmp)
    write.csv(tmp, file = paste0(cad, ".csv"), row.names = FALSE)
}

rm(tmp)    # tidy up

edited Aug 20 '18 at 21:21

answered Aug 20 '18 at 21:15

Rui Barradas

70,273
8
34
66

lebatsnok · Answer 2 · 2018-08-20T21:32:12.683

First, to extract the city from file name:

cities <- sub("\\.BRU", "", basename(files))

Now read in all files:

dataz <- lapply(files, read.csv, as.is=TRUE)
# it is usually good idea to add as.is

And then rbind the data from the same cities:

lapply(split(dataz, cities), function(x) do.call(rbind,x))

This strategy should work but may need some slight modifications as it is untested.

[edit]

A test case with random data:

dataz <- lapply(1:4, function(iii) as.data.frame(replicate(3, rnorm(5))))
lapply(split(dataz, cities), function(x) do.call(rbind,x))

Combining files based on their names in R into dataframes

2 Answers2