1

I would like to pass a variable (that holds the column name as a string) as argument to data.table. How do I do it?

Consider a data.table below:

myvariable <- "a"
myvariable_2 <- "b"

DT = data.table(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = 13:18)
DT
#    ID a  b  c
# 1:  b 1  7 13
# 2:  b 2  8 14
# 3:  b 3  9 15
# 4:  a 4 10 16
# 5:  a 5 11 17
# 6:  c 6 12 18
  1. I can use subset to extract columns i.e: subset(DT, TRUE, myvariable)but this just outputs the column/s
  2. How do I use subset to extract column based on some criteria? e.g: extract myvariable column when myvariable_2 < 10
  3. How do I extract summary statistics over groups by passing column names as variables?
  4. How do I plot descriptive plots using data.table by passing column names as variables?

I know that this could be easier in data.frame i.e. passing variables as column names. But I read everywhere that data.table is faster/memory efficient hence would like to stick with it.

Does switching between data.table and data.frame have huge memory/performance implications?

I do not want to explicitly code the column names as I want this piece of code to be re-usable.

Uwe
  • 41,420
  • 11
  • 90
  • 134
user2979010
  • 357
  • 1
  • 8
  • 18
  • I suggest you take a read through https://rawgit.com/wiki/Rdatatable/data.table/vignettes/datatable-faq.html as a starting point. – thelatemail Nov 03 '16 at 05:37
  • In the devel version you can do that as you would in a data.frame. When the stable version (1.9.8) will be released you won't need `with = FALSE` anymore – David Arenburg Nov 03 '16 at 10:19

1 Answers1

3

the comment from @thelatemail is a very good start. Do read that first! Another quick way is below

library(data.table)
df = data.table(a=1:10, b=letters[1:2], c=11:20)

var1="a"
var2="b"

dt1=df[,c(var1,var2), with=F]

Think of "with=F" as making "j" part data.table behave like that of data.frame

Edit 1 : to subset on a condition within a datatable

df[get(var1) > 5, c(var1, var2),with = F]
joel.wilson
  • 8,243
  • 5
  • 28
  • 48
  • This "Think of "with=F" as making data.table behave like data.frame" will lead to wrong intuition. – djhurio Nov 03 '16 at 06:51
  • @djhurio could you elaborate more? I'm also learning here and would like to know of the purpose/role of "with=". Thanks! – joel.wilson Nov 03 '16 at 07:54
  • 1
    From the `data.table` help: "By default with=TRUE and j is evaluated within the frame of x; column names can be used as variables. When with=FALSE j is a character vector of column names or a numeric vector of column positions to select, and the value returned is always a data.table." – djhurio Nov 03 '16 at 10:20
  • So, when setting `with = F`, the `data.table` argument `j` acts like `j` in `data.frame`. So this is only about the `j` argument not about the `data.table` as whole. I hope it helps. – djhurio Nov 03 '16 at 10:26
  • @user3801801 How can I include conditions within that? e.g. var1 > 10? – user2979010 Nov 03 '16 at 10:55
  • @user2979010 var1 is a character! you cannot do logical operation on that! Do you want to subset the data.table? – joel.wilson Nov 03 '16 at 10:56
  • @user3801801. Perhaps surprisingly, you can use logical comparisons with characters and numerics. Try `"A" > 1` for example. If v1 is a factor, this will produce a vector of NAs with a warning, but will not return an error. – lmo Nov 03 '16 at 11:31
  • @user2979010 I have redited the answer. Is that what you needed? – joel.wilson Nov 03 '16 at 11:38
  • @joel.wilson thanks a ton for that. Why does `data.table` need a `get` whereas `data.frame` does not. Does the ordering of the input arguments matter? – user2979010 Nov 04 '16 at 04:55
  • data.tables for efficieny and speed, evaluates the expressions inside the scope of data.table and so needs get()/eval() to extract values not present within. All this is explained in the link shared by @thelatemail . please go through that. It would benefit you – joel.wilson Nov 04 '16 at 05:01
  • Thanks @joel.wilson. I am quite new to R and it is quite confusing work this way in R. for e.g. assign new values to col `dt[, var1 := 1, with=FALSE` however to get the column `dt[, c(var1), with=FALSE`. I have a lot of reading to do. thanks a ton for your help – user2979010 Nov 04 '16 at 05:03
  • I do agree and so does many who have been using data.frames! Keep practising and it shall stick with you. Even I find it difficult not to get confused between those 2! – joel.wilson Nov 04 '16 at 05:06