1
library(tidyverse)
library(data.table)
dt <- data.table(x=1:3)
dt[x==1]
myfun <- function(d) d[x==1,x:=NA]
dt2 <- dt %>% myfun
dt[x==1]

In this example dt (a data.table) is being sent as an argument to a function (myfun) via pipe. Then the result is saved into the object dt2.

By why is dt modified? (as you can see the value of x in row 1 goes from 1 to NA)

LucasMation
  • 2,408
  • 2
  • 22
  • 45

1 Answers1

7

It is the assignment := which does assigns by reference. According to ?:=

:= is defined for use in j only. It adds or updates or removes column(s) by reference. It makes no copies of any part of memory at all. ... DT is modified by reference and returned invisibly. If you require a copy, take a copy first (using DT2 = copy(DT)).

If we don't want to change the original data, get a copy of the data and use that data

dt1 <- data.table::copy(dt)

and use the 'dt1'

akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thank you @akun. I understand `:=` modifies by reference within the function. What was unexpetect was that this would modify the original dt outside the function (I guess this is a feature of R function scoping). Just to add, the solution to the problem then is: `dt2 <- dt %>% copy %>% myfun` – LucasMation Apr 06 '21 at 19:19
  • 1
    @LucasMation If I have a data.table, I would be careful in doing the assignment on the original object either within or outside a function – akrun Apr 06 '21 at 19:20
  • 2
    @LucasMation You may want to check [this](https://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly/27840349#27840349), specially the first answer – PavoDive Apr 06 '21 at 20:48