1

In the StackOverflow question about rolling sum it was given an answer involving the use of a list of values into i and by= data.table arguments.

This feature did not seems obvious to me when a looked into the package manual or vignettes.

In the quest to understand this feature I created some simple code with some unintuitive results where I am just trying to pass different lists of index and groups to the functions i and by and see what output was given by the function j.

library(data.table)
DT1 <- data.table(x = c(1, 2, 3), y = c('a', 'b', 'b'))
DT2 <- data.table(x = c(1, 2, 3, 4) , y = c('a', 'a', 'b', 'c'))
DT1[, idx := .I]
DT2[, idx := .I]
DT1[DT2$x, idx, by = DT2$y]

   DT2 idx
1:   a   1
2:   a   2
3:   b   3
4:   c   0

DT1[DT2$x, x, by = DT2$y]

   DT2             x
1:   a  1.000000e+00
2:   a  2.000000e+00
3:   b  3.000000e+00
4:   c 1.919019e-316

DT1[DT2$idx, x, by = DT2$y]

   DT2             x
1:   a  1.000000e+00
2:   a  2.000000e+00
3:   b  3.000000e+00
4:   c 1.919019e-316

Can someone explain this feature and why this simple example give a bogus result?

I was expecting to this code return errors since I am passing groups and indexes of DT2 that are not in DT1.

Community
  • 1
  • 1
Jonatas Eduardo
  • 655
  • 2
  • 9
  • 17
  • Can you explain in words what you think your code does and what result you expect? – Gregor Thomas Feb 06 '17 at 18:47
  • 1
    I don't understand what you're trying to do here and so can't parse which example you think gives a bogus result..? DT[some_integers] accesses rows of DT. – Frank Feb 06 '17 at 18:58
  • When you say *"this simple example give a bogus result,"* it shows you have some expectation, and the result surprises you. What is your expectation? How is the result different from your expectation? What makes it seem "bogus" to you? – Gregor Thomas Feb 06 '17 at 18:59
  • And maybe you could experiment with an even smaller example? If you use a `DT2` with maybe 2 or 3 rows perhaps your expectation and result will be clearer? – Gregor Thomas Feb 06 '17 at 19:01
  • I was expecting to this code to give errors since I am passing indexes and groups of DT2 that are not in DT1. – Jonatas Eduardo Feb 06 '17 at 19:16
  • The syntax of `DT[i, j, by]` is: select rows with `i`, group by `by` then do `j`. It makes perfect sense that `i` and `by` should be the same length, right? – Frank Feb 07 '17 at 01:10
  • 1
    @Frank You are right, but notice that in these example I am trying to select the fourth row of the DT1 data that only has 3 rows. When I do this, the `j` return 0, but it should be an `NA` or an error. – Jonatas Eduardo Feb 07 '17 at 12:49
  • 1
    Ok thanks for simplifying the example. That is odd. Even simpler: `DT1[4, .(idx, x), by=NA_character_]` or `DT1[4L, .(idx, x), by=NA_character_]` – Frank Feb 07 '17 at 13:45

0 Answers0