I am a slightly novice R user, and I am wondering how to automate finding matching rows (e.g., inner join merge), from one "master" data.table across multiple, large (meaning ~8 GB files) data.tables, and to take those matches and place them in one final table.
The "master" data.table has two columns and looks like this:
pid pkid
123 8975A
456 9683
While the multiple, larger data.tables look like this:
pid pkid pha
123 8975A 3
456 9683 2
789 1253 3
I have my code set up to read all of the special delimited files from one directory, set the key for the data.table merge, and to make names from the files:
temp = list.files(pattern="*.txt")
list2env(
lapply(setNames(temp, make.names(gsub("*.txt$", "", temp))),
fread,sep="|",setkey(packageid,patientid)), envir = .GlobalEnv)
Where I'm stuck is how to automate the numerous (40 something?) merges from the one "master" data.table and to dump all of the matches from the many merges into a separate data.table. Basically, I want all the cases in the larger data.tables that have matching pid
and pkid
to the "master" to get the corresponding pha
.
The only way that I've come up with is tragically manual, and will take a long time to write out, and is undoubtedly highly inefficient.
res1<-data.table::merge(master,dt1)
res2<-data.table::merge(master,dt2)
l=list(res1,res2)
final<-rbindlist(l)
Any thoughts? Thanks!