complex multiple csv merging in r

Question

We have two types of dataframes DF_1 and DF_2 each read from csv files with file names of the types

1) DF_1 csv file name of randomNumbers-text1.csv for.

2) DF_2 csv file name of randomNumbers-text2.csv for.

They are perfectly merged into a single dataframe called merged_DF by

merged_DF = merge(DF_1, DF_2)

Now comes the tricky part.

The working directory is made up of around 13000 csv files where half is of the DF_1 csv file name type and the other of the DF_2 csv file type (see point 1 & 2 above).

Problem: How does one perform the above described perfectly merged operation on all approx. 13000 csv files and combine the output into a single combined dataframe call it combined_merged_DF?

How does one solve this the R way

Any help is much appreciated :)

One example using `lapply` and `rbindlist` from `data.table` is [here](http://stackoverflow.com/questions/43515338/error-importing-mutiple-csv-r/43515645#43515645) — Kristoffer Winther Balling, May 11 '17 at 07:56
Thanks Kristoffer. I do have a solution for combining multiple csv files into a single dataframe, but not a solution for outputting a single combined_merged_csv by looping trough a merge of two types of dataframes read from multiple csv files of which there are two different kinds distinguished by their csv file names. — Mr_Holm, May 11 '17 at 08:09
another example: http://stackoverflow.com/a/32888918/2204410 — Jaap, May 11 '17 at 08:26
See Aurèle answer which almost solved the problem. It only needs a minor change. — Mr_Holm, May 11 '17 at 08:46

score 0 · Accepted Answer · answered May 11 '17 at 08:23

0

Let's assume the files in the directory are 124-type1.csv, 723-type1.csv, 899-type1.csv, 124-type2.csv, 723-type2.csv, 100-type2.csv, wrong-file.csv. You could do:

csv_files <- list.files("./path/to/csvs", ".csv$")
# [1] "124-type1.csv"  "723-type1.csv"  "899-type1.csv"  "124-type2.csv" 
# [5] "723-type2.csv"  "100-type2.csv"  "wrong-file.csv"

ids_in_common <- intersect(
  sub("-type1\\.csv$", "", grep("type1", csv_files, value = TRUE)),
  sub("-type2\\.csv$", "", grep("type2", csv_files, value = TRUE))
)
# [1] "124" "723"
do.call("rbind", lapply(ids_in_common, function(id) {
  merge(
    read.table(file.path("./path/to/csvs", paste0(id, "-type1.csv"))),
    read.table(file.path("./path/to/csvs", paste0(id, "-type2.csv")))
  )
}))

answered May 11 '17 at 08:23

Aurèle

12,545
1
31
49

Indeed a clever protocol Aurèle, but it needs the final touch because it shows the message Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 5 elements csv type2 needs a sep=";" to be read correctly. How does one implement this in your clever protocol? Thanks for your great solution! :) – Mr_Holm May 11 '17 at 08:37
Simply add `, sep = ";"` in `read.table(file.path("./path/to/csvs", paste0(id, "-type2.csv")), sep = ";")` maybe? Or use `read.csv2` instead? See `help(read.table)` – Aurèle May 11 '17 at 08:58
read.csv2 did the job. Thanks for your great answers Aurèle - It is highly appreciated! :) – Mr_Holm May 11 '17 at 19:36

complex multiple csv merging in r

1 Answers1