8

I understood that data.table is not copied when returned from a function. However, in this particular case it does get copied. Can one explain why?

dt1 <- data.table(a=1)
dt2 <- data.table(b=1)
dt3 <- data.table(c=1)

address(dt1); address(dt2); address(dt3)
[1] "000000005E20D990"
[1] "00000000052301E8"
[1] "000000001D622210"

l <- list(a=dt1, b=dt2, c=dt3)
address(l$a); address(l$b); address(l$c)
$[1] "000000005E20D990"
$[1] "00000000052301E8"
$[1] "000000001D622210"

f <- function(dt) {setnames(dt, toupper(names(dt)))}
l <- Map(f, l)
address(l$a); address(l$b); address(l$c)
$[1] "000000001945C7B0"
$[1] "0000000066858738"
$[1] "000000001B021038"

dt1
$   A
$1: 1
dt2
$   B
$1: 1
dt3
$   C
$1: 1

So it is the last line which is making the copy. However, the following does not make a copy.

address(dt1)
$[1] "000000005E20D990"
dt4 <- f(dt1)
address(dt4)
$[1] "000000005E20D990"

What am I missing?

Update As everybody has pointed out, map or mapply is making a copy. lapply works in the above case but my actual code needs multiple inputs in the function. My understanding was that all apply functions use same code. But it does not seems to be the case.

imsc
  • 7,492
  • 7
  • 47
  • 69
  • 5
    `Map` is a wrapper for `mapply` and I believe the copy happens in `mapply`. – Roland Jan 21 '16 at 13:23
  • I guess @Roland is right. `l<-lapply(l,f)` doesn't copy. I should add that the use of `Map` is pretty unusual, since there is just one argument and so `lapply` should be preferred. – nicola Jan 21 '16 at 14:10
  • 5
    I noted in the source `C` code of `lapply` there is the line `if (MAYBE_REFERENCED(tmp)) tmp = lazy_duplicate(tmp);` while in `mapply` the line is `if (MAYBE_REFERENCED(tmp)) tmp = duplicate(tmp);`. Could that be the cause? I'm not expert of R internals, so can't tell for sure. – nicola Jan 21 '16 at 14:21
  • 1
    You can easily avoid using `Map` or `mapply` if you have objects available in the parent frame. Then use `lapply(seq_along(l), function(i) ...)` and subset objects used in `mapply` using `i` iterator, so `l[[i]]` in your example, potentially more as `mapply` loops over multiple objects. – jangorecki Jan 21 '16 at 14:33
  • 1
    If I switch `l <- Map(f, l)` to simply `Map(f, l)`, it seems to work fine. You rarely need to use the return value of `set*` functions. – Frank Jan 21 '16 at 14:49
  • 2
    You should reword the question since `funcdt<-f(dt1); address(funcdt)` shows same address. In other words, the problem isn't the function, it's the `Map` – Dean MacGregor Jan 21 '16 at 18:03
  • Thanks @Frank. `Map(f,l) works. But it still makes a copy of the data just not assign it to `l`. – imsc Jan 22 '16 at 08:36
  • @imsc please post an answer to your question so it can be considered resolved. – jangorecki Apr 22 '19 at 16:17

1 Answers1

0

As everybody has pointed out, Map or mapply is making a copy.

jangorecki
  • 16,384
  • 4
  • 79
  • 160