I would like to pass a variable (that holds the column name as a string) as argument to data.table. How do I do it?
Consider a data.table below:
myvariable <- "a"
myvariable_2 <- "b"
DT = data.table(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = 13:18)
DT
# ID a b c
# 1: b 1 7 13
# 2: b 2 8 14
# 3: b 3 9 15
# 4: a 4 10 16
# 5: a 5 11 17
# 6: c 6 12 18
- I can use
subset
to extract columns i.e:subset(DT, TRUE, myvariable)
but this just outputs the column/s - How do I use
subset
to extract column based on some criteria? e.g:extract myvariable column when myvariable_2 < 10
- How do I extract summary statistics over groups by passing column names as variables?
- How do I plot descriptive plots using data.table by passing column names as variables?
I know that this could be easier in data.frame
i.e. passing variables as column names. But I read everywhere that data.table
is faster/memory efficient hence would like to stick with it.
Does switching between data.table
and data.frame
have huge memory/performance implications?
I do not want to explicitly code the column names as I want this piece of code to be re-usable.