by = 1:NROW(DT) works row-wise, but by = .I doesn't in R data.table

Question

I am interested in understanding how data.table in R handles row-wise calculations in j. I thought these should return the same thing given this question and this post, though I am not experienced at reading these change logs.

DT <- data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9)

# Example function to do row-wise
DT[, mean(v), by = 1:NROW(DT)]
   NROW V1
1:    1  1
2:    2  2
3:    3  3
4:    4  4
5:    5  5
6:    6  6
7:    7  7
8:    8  8
9:    9  9

DT[, mean(v), by = .I]
   V1
1:  5

The use of `.I` was not initially supported in `by` but now it is but only the dev version currently supports it. See [HERE](https://github.com/Rdatatable/data.table/issues/1732) or _39._ in the [NEWS](https://rdatatable.gitlab.io/data.table/news/index.html). So, if you want to use `.I` in by, you need to upgrade to the dev version (`data.table::update.dev.pkg()`). — B. Christian Kamgang, Aug 11 '22 at 04:54
Side note: `1:NROW(DT)` or even the more canonical `1:nrow(DT)` fails in a specific situation: when there are zero rows. If you have any expectation of automating your code ("unsupervised" execution), I suggest you take on a more defensive posture in your coding and instead use `by=seq_len(nrow(DT))`. If there are 1 or more rows, it always performs exactly the same as `1:NROW(DT)`, but `seq_len` is smart enough to _not_ produce a vector length 2 when its input is 0. (Contrast `1:0` with `seq_len(0)`.) — r2evans, Aug 11 '22 at 11:55

by = 1:NROW(DT) works row-wise, but by = .I doesn't in R data.table

0 Answers0