I have a data table with two columns named based on variables. I'm a touch new to the quirks of the data.tables package, but I've gotten something like the following code to work so far...
varNames <- c("Subtype", ...)
for (i in length(varNames)) {
nm1 <- (paste0(varNames[i],"1"))
nm2 <- (paste0(varNames[i],"2"))
DT[,(nm1):= x1]
DT[,(nm2):= x2]
#A BUNCH OF OTHER CODE GOES HERE...
}
I want to single out the rows where columns named nm1 and columns named nm2 match, but I know I can't just do this...
nmMatch <- (paste0(varNames[i],"Match"))
DT[, (nmMatch) := F ]
DT[(nm1)==(nm2), (nmMatch) := T] #Returns empty data table :^(
I think this is either because there are no columns actually named "nm1" or "nm2" or because the variable named nm1 does not equal the variable named nm2.
If I didn't need to assign these based on a vector of character values, I would write this to get what I'm looking for...
DT[, "SubtypeMatch" := F]
DT[(Subtype1) == (Subtype2), SubtypeMatch := T]
How do I get a subset of rows based on column values if I need to reference those column names through variables? Is there a way to do that for data tables? These end up being huge structures (> 1000000 rows), so any work arounds using sapply() end up being prohibitively slow.
I recognize that there may be ways that I could fundamentally restructure my code so that I never really need to do this, and I'm happy to hear those, but I'm also interested in any "Proper" way to accomplish this subsetting task with data.tables.