0

I have bunch of csv's that that all share the same ID column and an additional column. I need to combine them all together into one csv in R based on the ID column. An example of my csv's below, but I have around 5,000 csv in need to combine into a single dataset:

Table 1

ID Red
1 5
2 7

Table 2

ID Blue
1 6
2 2

Table 3

ID Green
1 9
2 8

Output table

ID Red Blue Green
1 5 6 9
2 7 2 8

I know I need to list all the files but haven't been able to find what function to use. I can combine two CSVs using the merge function, would I need to loop it or is there a single function that can do many?

setwd("C:/Desktop/CSVs")
all_files <- list.files(pattern = ".csv")
SpookyDLX
  • 79
  • 5

1 Answers1

0

Probably Reduce:

‘Reduce’ uses a binary function to successively combine the elements of a given vector and a possibly given initial value.

L <- list(
   data.frame(ID = 1:2, Red = c(5,7)),
   data.frame(ID = 1:2, Blue = c(6,2)),
   data.frame(ID = 1:2, Green = c(9,8)))
Reduce(\(x,y) merge(x, y, by = "ID"), L)

You can get a list of the file contents (L) via lapply(all.files, read.csv).

Just plain Reduce(merge, L) works just as well but it's safer to write an anonymous function as in the example, to make sure you're merging on the specified column/key. (Unfortunately you can't say Reduce(merge, L, by = "ID"), i.e. Reduce() doesn't have a ... argument that could pass additional argument through to merge().)

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453