I want to subset multiple data tables to records that correspond to unique IDs. Similar threads are here and here; these solutions work if I program each data table separately, but I would like to run the code on all of the data tables I have.
My data looks something like this:
dt1 <- data.table(var1 = c("A", "B", "C"), var2 = c("...", "...", "..."), id = c("101", "105", "105"))
dt2 <- data.table(var1 = c("D", "E", "F"), var2 = c("...", "...", "..."), var3 = c("...", "...", "..."), id = c("110", "110", "139"))
dt3 <- data.table(var1 = c("B", "G", "H"), id = c("105", "113", "113"))
> dt1
var1 var2 id
1: A ... 101
2: B ... 105
3: C ... 105
> dt2
var1 var2 var3 id
1: D ... ... 110
2: E ... ... 110
3: F ... ... 139
> dt3
var1 id
1: B 105
2: G 113
3: H 113
and I would like to remove all duplicate IDs so that each data table would eventually be this:
> dt1
var1 var2 id
1: A ... 101
> dt2
var1 var2 var3 id
3: F ... ... 139
> dt3
var1 id
1: B 105
I can successfully subset each data table separately by using dt1 <- dt1[, if(.N==1) .SD, by = id]
and so on, but instead I would like to update each table at once.
I've tried lapply(list(dt1, dt2, dt3), function(x) x[, if(.N==1) .SD, by = id])
but this does not update my tables in my environment. If I save the result in a list, for example using sub <- lapply(list(dt1, dt2, dt3), function(x) x[, if(.N==1) .SD, by = id])
I can save the data tables as dt1 <- as.data.table(sub[1])
etc., but then I am back to manually coding for each data table, which I am trying to avoid. I also tried lapply(names(sub), function(x) {assign(x, value=sub[[x]], envir=globalenv())})
which just returned list()
in my console and also didn't update anything as far as I could tell. Using list2env(sub, envir=.GlobalEnv)
as suggested here resulted in an error: "names(x) must be a character vector of the same length as x".
Any suggestions on how to remove the duplicate ID records within my original data tables simultaneously?