0

I tried using two ways to add a column in a data.table, it returned different resuts. But I don't understand why, could you please give me a hint? Way 1:

avg_tvd <- dev_survey4[Grp==0 | Grp==1, .(avgTVD = mean(TVDmASL, na.rm=TRUE)),
                       by = .(Grp,WELL,APA_Pair_ID)]

Here are the results:

enter image description here

Way 2:

avg_tvd <- dev_survey4[Grp==0 | Grp==1, avgTVD := mean(TVDmASL, na.rm=TRUE),
                       by = .(Grp,WELL,APA_Pair_ID)]

Here are results:

enter image description here

The results of way 1 are what I want. But why way 2 has different results? There are two differences between them:

  1. Columns of ways 2 are more than way 1;
  2. Row of way 2 has Grps besides 0 and 1.
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
shy zhan
  • 35
  • 6
  • Please provide a reproducible example – David Leal Jun 16 '18 at 01:15
  • 2
    With way 1, you are not adding a column to a table, but rather (1) filtering it and (2) aggregating it, making a new table. With way 2, you are adding a column to the filtered part of the table (with the part not meeting the filter getting filled with NA). You should run through a tutorial; data.table comes with vignettes for this. – Frank Jun 16 '18 at 01:56

1 Answers1

4

= for aggregating/summarising, result has same number of rows as number of unique values in by

:= for adding a column, result has the same number of rows as the original

For example:

library(data.table)
dt <- data.table(I = 1:3, x = 11:13, y = c("A", "A", "B"))
dt[, .(mx = mean(x)), by = "y"]
#>    y   mx
#> 1: A 11.5
#> 2: B 13.0
dt[, mx := mean(x), by = "y"][]
#>    I  x y   mx
#> 1: 1 11 A 11.5
#> 2: 2 12 A 11.5
#> 3: 3 13 B 13.0

Created on 2018-06-16 by the reprex package (v0.2.0).

Hugh
  • 15,521
  • 12
  • 57
  • 100