0

I am new to programming. I would like to try to join two tables/files from different directories in R. The program should loop through the folders from both directories in parallel to read in one file after the other. Then, the two current .csv-files, originating from the looping folders, should be joined. Unfortunately, I have too many files to copy the columns from one file to another by hand.

The source directories and files look like the following:

filepath1: D:/Test1/

filenames: A1, A2, A3, A4,...

filepath2: D:/Test2/

filenames: B1, B2, B3, B4,...

Thereby, A1 and B1 should be joined based on one common column.

Then, A2 and B2 should be joined. Then A3 and B3 etc.

Basically, when I use the cbind, merge or join function on two specific files I selected manually, it works nicely. I used the following code:

library(readr)
library(dplyr)
A1 <- read.csv("D:/Test1/A1.csv")
B1 <- read.csv("D:/Test2/B1.csv")
mydata = inner_join(A1, B1, by="micrometer")

When I try to loop over the folder Test1 and then over the folder Test2, I get a list of data.frames. Then joining them results in an error saying that the function "'inner_join' cannot be applied to an object of class "list"".

library(rio)
require(data.table)

setwd("D:/Test1/")

file <- dir(pattern ="*.csv") 
    for (k in 1:length(listcsv)) {
      ldf[[k]] <- read.csv(listcsv[k])
    }
data.files = list.files(pattern = "*.csv") 
mydata1 <- lapply(file, read.csv)

setwd("D:/Test2/")

file2 <- dir(pattern ="*.csv") 
    for (j in 1:length(listcsv)) {
      ldf[[j]] <- read.csv(listcsv[j])
    }
data.files2 = list.files(pattern = "*.csv") 
mydata2 <- lapply(file2, read.csv) 

myfulldata = inner_join(mydata1, mydata2, by="micrometer")

Could you please help me to find the mistake?

Community
  • 1
  • 1
Donkey19
  • 3
  • 1

1 Answers1

1

Your code up to this line is correct.

myfulldata = inner_join(mydata1, mydata2, by="micrometer")

You're trying to call inner_join on the list of data.frames. You're almost correct. If you create a new for loop:

myfulldata = list(length(mydata1))
for (i in 1:length(mydata1)) {
  myfulldata[[i]] = inner_join(mydata1[[i]], mydata2[[i]], by = "micrometer")
}

That should work. You can also use mapply

myfulldata = mapply(inner_join, mydata1, mydata2, MoreArgs=list(by = "micrometer"))

Which is much more compact and better R style.

MentatOfDune
  • 309
  • 1
  • 9