1

I am interested in understanding how data.table in R handles row-wise calculations in j. I thought these should return the same thing given this question and this post, though I am not experienced at reading these change logs.

DT <- data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9)

# Example function to do row-wise
DT[, mean(v), by = 1:NROW(DT)]
   NROW V1
1:    1  1
2:    2  2
3:    3  3
4:    4  4
5:    5  5
6:    6  6
7:    7  7
8:    8  8
9:    9  9

DT[, mean(v), by = .I]
   V1
1:  5
diomedesdata
  • 995
  • 1
  • 6
  • 15
  • 1
    The use of `.I` was not initially supported in `by` but now it is but only the dev version currently supports it. See [HERE](https://github.com/Rdatatable/data.table/issues/1732) or _39._ in the [NEWS](https://rdatatable.gitlab.io/data.table/news/index.html). So, if you want to use `.I` in by, you need to upgrade to the dev version (`data.table::update.dev.pkg()`). – B. Christian Kamgang Aug 11 '22 at 04:54
  • 3
    Side note: `1:NROW(DT)` or even the more canonical `1:nrow(DT)` fails in a specific situation: when there are zero rows. If you have any expectation of automating your code ("unsupervised" execution), I suggest you take on a more defensive posture in your coding and instead use `by=seq_len(nrow(DT))`. If there are 1 or more rows, it always performs exactly the same as `1:NROW(DT)`, but `seq_len` is smart enough to _not_ produce a vector length 2 when its input is 0. (Contrast `1:0` with `seq_len(0)`.) – r2evans Aug 11 '22 at 11:55

0 Answers0