2

I have a simple DT and I would like to add a column to the rest. The code is as follows: (works)

x <- data.table(a=1:5,b=5:1,c=rep(999,5))
for(k in c("a","b")){x[,k] <- x[,..k]+x[,.(c)]}

Now here is the question: Why do I have to use .. for the assignment? Also if I try to use .. in the first case, i.e.

for(k in c("a","b")){x[,..k] <- x[,..k]+x[,.(c)]}

There is an error: "[...]object '..k' not found". This seems strange, that I have to change the syntax within the scope.

Now in dataframe, the equivalent formulation is very clear:

for(k in c("a","b")){x[,k] <- x[,k]+x[,c]} # error with DT
x <- data.frame(a=1:5,b=5:1,c=rep(999,5))
for(k in c("a","b")){x[,k] <- x[,k]+x[,"c"]} # works with dataframe

So I am wondering (1) if the above code is the correct way to do that in datatable (please explain the .. operator, the datatable FAQ 1.1 doesn't address this in particular); and if (2) there are alternative ways to write this in a cleaner way. Thanks for any hints.

User878239
  • 649
  • 1
  • 7
  • 14
  • 1
    In `data.table`, `x[,k]` looks for a column named `k`, regardless whether `k` is a variable or not in the calling scope. If you define `k<-"a"` and with `x[,k]` you really intend `x[,"a"]` (a la `data.frame`), you need `x[,..k]` in `data.table`. Also `x[,k,with=FALSE]` is ok in `data.table`. – nicola Apr 19 '19 at 14:57
  • thanks for the explanation @nicola. However, this seems only true for the right side of the assignment in the loop, but not for the left side. – User878239 Apr 19 '19 at 15:56
  • 1
    In `data.table` you don't modify columns with `<-` and so the assignment is a `data.frame` method. You use `:=` (see `?set`). For instance here, you should use something like `for(k in c("a","b")) x[,(k):=get(k)+c]`. – nicola Apr 21 '19 at 04:40

1 Answers1

1

from the official introducion (slightly edited for your example):

For those familiar with the Unix terminal, the .. prefix should be reminiscent of the “up-one-level” command, which is analogous to what’s happening here – the .. signals to data.table to look for the k variable “up-one-level”, i.e., in the loop environment in this case.

So this operator escapes the dataframe and looks for the k variable in one higher level, gets the value and comes back. Not sure why they made it like this, but maybe the variables are not transferred.

You can also use the with argument:

x[,k,with=FALSE]


Edit:

I just checked the source code of data.table. They get the called variable from parent.frame(), so the environment where the function get's called. This is triggered by .. or the with argument. So if you don't use it, the function is not able to get the parameters of the environment.

A question about parent.frame() is found here

mischva11
  • 2,811
  • 3
  • 18
  • 34
  • thanks that makes it clear how it works. Only one thing: Do you know why the following: `for(k in c("a","b")){x[,..k] <- x[,..k]+x[,.(c)]}` does not work for the datatable? That is, using `..` also for the left side of the assignment. This was part of the question above. – User878239 Apr 19 '19 at 15:54
  • @Talik3233 no sorry. The commentary in the source code hints the developer knows this. Following could also be completely wrong: I think the difference is in writing the variable. When you asign your variable to your dataframe you have to save it somehow. I think something happens internal with the temporary space. But i can't exactly tell you what. This is just something i came up with, from the clues and the error message. In conclusion: it's different if you want to read or write your data table – mischva11 Apr 19 '19 at 17:00
  • 1
    thanks, so the question was justified. Good to know after all :) – User878239 Apr 19 '19 at 17:02
  • 1
    It doesn't work for the left side because that is a `data.frame` method and in `data.table` you change columns by reference with `:=`. – nicola Apr 21 '19 at 04:41