I am using a for loop to merge multiple files with another file:
files <- list.files("path", pattern=".TXT", ignore.case=T)
for(i in 1:length(files))
{
data <- fread(files[i], header=T)
# Merge
mydata <- merge(mydata, data, by="ID", all.x=TRUE)
rm(data)
}
"mydata" looks as follows (simplified):
ID x1 x2
1 2 8
2 5 5
3 4 4
4 6 5
5 5 8
"data" looks as follows (around 600 files, in total 100GB). Example of 2 (seperate) files. Integrating all in 1 would be impossible (too large):
ID x3
1 8
2 4
ID x3
3 4
4 5
5 1
When I run my code I get the following dataset:
ID x1 x2 x3.x x3.y
1 2 8 8 NA
2 5 5 4 NA
3 4 4 NA 4
4 6 5 NA 5
5 5 8 NA 1
What I would like to get is:
ID x1 x2 x3
1 2 8 8
2 5 5 4
3 4 4 4
4 6 5 5
5 5 8 1
ID's are unique (never duplicates over the 600 files).
Any idea on how to achieve this as efficiently as possible much appreciated.