0

I have a number of csv files in the working directory. Some of these files share a string (ex. ny, nj, etc.) at the beginning of their name. Below is a screenshot:

enter image description here

What I want to do is to import and merge the csv files that share a string. I have searched and seen people suggesting regex, however I am not sure if that is best way to go. I appreciate any help with this.

Best, Kaveh

jpsmith
  • 11,023
  • 5
  • 15
  • 36
Kaveh Ekbia
  • 107
  • 1
  • 9

1 Answers1

2

Here's a function that may be more efficient than for loops, though there may be more elegant solutions.

Since I dont know what your excel files contain, I created several dummy files with a few columns ("A", "B", and "C"). I dont know what you would merge by; in this example I merged by column "A".

Given the ambiguity in the files, I have edited this to include both merge and bind approaches, depending on what is needed.

To test these functions, create a few CSV files in a folder (I created NJ_1.csv, NJ_2.csv, NJ_3.csv, NY_1.csv, NY_2.csv, each with columns A, B, and C.)

For all options, this code needs to be run.

setwd("insert path where folder with csv files is located")
library(dplyr)

OPTION 1:

If you want to merge files containing different data with a unique identifier. Example: one file contains temperature and one file contains precipitation for a given geographic location

importMerge <- function(x, mergeby){
  temp <- list.files(pattern = paste0("*",x))
  files <- lapply(temp, read.csv)
  merge <- files %>% Reduce(function(dtf1, dtf2) left_join(dtf1, dtf2, by = mergeby), .)
  return(merge)
}

NJmerge <- importMerge("NJ", "A")
NYmerge <- importMerge("NY", "A")

OPTION 2:

If you want to bind files containing the same columns. Example: Files contain both temperature and precipitation, and each file is a given geographic location. Note: All columns need to be the same name in each file

importBind <- function(x){
  temp <- list.files(pattern = paste0("*",x))
  files <- lapply(temp, read.csv)
  bind <- do.call("rbind", files)
  return(bind)
}

NJbind <- importBind("NJ")
NYbind <- importBind("NY")

OPTION 3

If you want to bind only certain columns from files containing the same column names Example: Files contain temperature and precipitation, along with other columns that aren't needed, and each file is a given geographic location. Note: All columns need to be the same name in each file. Since default is NULL, leaving keeps out will default to option 2 above.

importBindKeep <- function(x, keeps = NULL){ # default is to keep all columns
  temp <- list.files(pattern = paste0("*",x))
  files <- lapply(temp, read.csv)
  # if you wanted to only keep a few columns, use the following. 
  if(!is.null(keeps)) files <- lapply(files, "[", , keeps)
  bind <- do.call("rbind", files)
  return(bind)
}

NJbind.keeps <- importBindKeep("NJ", keeps = c("A","B")) # keep only columns A and B
NYbind.keeps <- importBindKeep("NY", keeps = c("A","B"))

See How to import multiple .csv files at once? and Simultaneously merge multiple data.frames in a list, for more information.

jpsmith
  • 11,023
  • 5
  • 15
  • 36
  • Sorry for not providing column names. There are few columns that I have to import and their names are "public_material", "private_material", "lat/lng", and "address". I can ignore the other columns. Thank you for your help. – Kaveh Ekbia Dec 30 '21 at 19:15
  • I believe I will merge by state since each group of cvs files that I import all refer to the same state (ex. nj) – Kaveh Ekbia Dec 30 '21 at 19:23
  • Hi Kaveh - do you truly want to *merge* the CSVs (i.e. merge a file with data on temperature with another file on data with precipitation by location) or do you want to bind/concatenate the files by columns to make one long list (i.e. bind files containing both temperature and precipitation from multiple locations)? This function was written for the former (per your question) but your comment makes me think you actually want the latter. – jpsmith Dec 30 '21 at 19:25
  • At any rate, I edited the answer to provide both options – jpsmith Dec 30 '21 at 19:32
  • Hi, sorry for confusing you. I do want to merge them by state. Each of the cvs files that I import represent a specific city in a small estate (i.e. newark, nj; atlantic city, nj, etc.) Since they are all the same state I want to merge (by state). All cvs files contain same information only on a different city in a same state. Sorry I hope this helps. – Kaveh Ekbia Dec 30 '21 at 19:41
  • Sounds like you want to bind (i.e. concatenate all the data for cities in NJ), not truly merge. Hopefully the above helps whether or not I am wrong about that :) – jpsmith Dec 30 '21 at 20:11
  • Thank you. So if I want to keep the columns "public_material", "private_material", "lat/lng" and "address", where in the code do I specify that? Thank you for all the help – Kaveh Ekbia Dec 30 '21 at 20:37
  • Also, you were correct. It is not merge, but bind :) – Kaveh Ekbia Dec 30 '21 at 20:38
  • Try option 3 - and make `keeps = c("public_material", "private_material", "lat/lon")` - it may take some debugging but hopefully that is a start! – jpsmith Dec 30 '21 at 20:38
  • Thank you so much. It was great help – Kaveh Ekbia Dec 30 '21 at 20:40
  • Hi @jpsmith, I used option 3. However, at the end, I received three errors. For the first error I believe the cause is that one of the csv files does not contain all the columns that needs to be imported. The list of columns that need to be imported are: "PRIVATE_MATERIAL", "LATITUDE", "LONGTITUDE", "PUBLIC_MATERIAL", "LATITUDE_2", "LONGITUDE_2". – Kaveh Ekbia Dec 31 '21 at 01:40
  • One of the cvs files only has "public_material", "latitude", "longitude" – Kaveh Ekbia Dec 31 '21 at 01:41
  • Also, for some cvs files the order of needed columns are: private_material, latitude, longitude, public_material, latitude, longitude; where is the cvs files the order is: public_material, latitude, longitude – Kaveh Ekbia Dec 31 '21 at 01:43
  • The errors that I receive are: – Kaveh Ekbia Dec 31 '21 at 01:44
  • Error in `[.data.frame`(X[[i]], ...) : undefined columns selected – Kaveh Ekbia Dec 31 '21 at 01:46
  • Called from: `[.data.frame`(X[[i]], ...) Error during wrapup: argument "expr" is missing, with no default – Kaveh Ekbia Dec 31 '21 at 01:47
  • Error during wrapup: target context is not on the stack – Kaveh Ekbia Dec 31 '21 at 01:47