-1

I have discovered R a couple of years ago and it has been very handy to clean up dataframes, prepare some data and to handle other basic tasks.

Now I would like to try using R to apply basic treatments but on many different files stored in different folders at once.

Here is the script I would like to improve into one function that would loop through my folder "dataset_2006" and "dataset_2007" to do all the work.

library(dplyr)
library(readr)
library(sf)
library(purrr)

setwd("C:/Users/Downloads/global_data/dataset_2006")

shp2006 <- list.files(pattern = 'data_2006.*\\.shp$',  full.names = TRUE) 
listOfShp <- lapply(shp2006, st_read)
combinedShp <- do.call(what = sf:::rbind.sf, args=listOfShp)

#import and merge CSV files into one data frame
folderfiles <- list.files(pattern = 'csv_2006_.*\\.csv$', full.names = TRUE) 

csv_data <- folderfiles %>% 
  set_names() %>% 
  map_dfr(.f = read_delim,
          delim = ";",
          .id = "file_name")

new_shp_2006 <- merge(combinedShp, csv_data , by = "ID") %>% filter(label %in% c("AR45T", "GK879"))
   
st_write(new_shp_2006 , "new_shp_2006.shp", overwrite = TRUE)




setwd("C:/Users/Downloads/global_data/dataset_2007")

shp2007 <- list.files(pattern = 'data_2007.*\\.shp$',  full.names = TRUE) 
listOfShp <- lapply(shp2007, st_read)
combinedShp <- do.call(what = sf:::rbind.sf, args=listOfShp)

#import and merge CSV files into one data frame
folderfiles <- list.files(pattern = 'csv_2007_.*\\.csv$', full.names = TRUE) 

csv_data <- folderfiles %>% 
  set_names() %>% 
  map_dfr(.f = read_delim,
          delim = ";",
          .id = "file_name")

new_shp_2007 <- merge(combinedShp, csv_data , by = "ID") %>% filter(label %in% c("AR45T", "GK879"))
   
st_write(new_shp_2007 , "new_shp_2007.shp", overwrite = TRUE)
MrFlick
  • 195,160
  • 17
  • 277
  • 295
zakros
  • 119
  • 9
  • So what exactly is your question here? Maybe something like this is a good starting point: https://stackoverflow.com/questions/14958516/read-all-files-in-directory-and-apply-multiple-functions-to-each-data-frame. It's helpful if you ask a more specific programming question. Show what you tried and describe where you are getting stuck. – MrFlick Jun 21 '22 at 15:14
  • I would like to automate a way to run my script for the subfolder "dataset_2006" then the subfolder "dataset_2007" without having to run it twice by changing my working directory manually as it is the case in my post :/ – zakros Jun 21 '22 at 15:32

1 Answers1

1

This is easy to achieve with a for-loop to loop over multiple items. To allow us to use wildcards, we can also add the function Sys.glob():

myfunction <- function(directories) {
  for(dir in Sys.glob(directories)) {
    # do something with a single dir
    print(dir)
  }
}

# you can specify multiple directories manually:
myfunction(c('C:/Users/Downloads/global_data/dataset_2006',
             'C:/Users/Downloads/global_data/dataset_2007'))

# or use a wildcard to automatically get all files/directories that match the pattern:
myfunction('C:/Users/Downloads/global_data/dataset_200*')
Caspar V.
  • 1,782
  • 1
  • 3
  • 16