25

Suppose I have create a list in R and append to it as follows:

x = list(10)
x[[2]] = 20

Is this equivalent to

x = list(10)
x = list(10, 20)

? I'm not so experienced with the particular details of how R handles lists in memory, but my limited understanding is that it tends to be copy-happy; what would be ideal for me would be that the first option doesn't involve essentially creating another list in memory, but just results in setting aside a new place in memory for the appended value. Essentially, if I have a big list, I don't want R to make another copy of it if I just want to append something to it.

If the behaviour I want is not what is given here, is there any other way I can get the desired effect?

guy
  • 428
  • 4
  • 16

4 Answers4

17

I'm fairly confident the answer is "no". I used the following code to double check:

Rprof(tmp <- tempfile(), memory.profiling = TRUE)

x <- list()
for (i in 1:100) x[[i]] <- runif(10000)

Rprof()
summaryRprof(tmp, memory = "stats")
unlink(tmp)

The output:

# index: runif
#      vsize.small  max.vsize.small      vsize.large  max.vsize.large 
#            76411           381781           424523          1504387 
#            nodes        max.nodes     duplications tot.duplications 
#          2725878         13583136                0                0 
#          samples 
#                5 

The relevant part being duplications = 0.

flodel
  • 87,577
  • 21
  • 185
  • 223
  • 3
    I don't think your reasoning is necessarily correct: duplications have a special meaning in R, and technically, while extending the the length of a vector creates a copy, it is not a duplication. See this thread on R-help: http://r.789695.n4.nabble.com/Understanding-tracemem-td4636321.html – hadley Oct 08 '12 at 13:59
12

Matthew Dowle's answer here and the rationale behind much memory efficiency is to stop the numerous behind the scenes copying by <-, [<-, [[<- and other base R operations (names etc)

[[<- will copy the whole of x. See the example below

x <- list(20)
 tracemem(x)
#[1] "<0x2b0e2790>"
 x[[2]] <- 20
# tracemem[0x2b0e2790 -> 0x2adb7798]: 

Your second case

x <- list(10,20)

is not really appending the original x but replacing x with an object that happens to be the original x with an appended value.

Community
  • 1
  • 1
mnel
  • 113,303
  • 27
  • 265
  • 254
  • (+1), The second case isn't appending, or an example of something I was proposing, but rather an example of something I don't want R to be doing behind the scenes. – guy Oct 07 '12 at 22:57
  • Ahh, I misread your question, it first read to me as you were asking whether `x <- list(10,20)`, was the equivalent (in terms of memory) to `x <- list(10); x[[2]] <- 20`. On rereading I see that it was more nuanced than that. – mnel Oct 07 '12 at 23:18
  • Yes but in that linked answer `x` was a `data.frame`. In this question `x` is a `list`. Copying behaviour of `list` can be different. Note that there is no `[<-.list` method but there is a `[<-.data.frame`. Use `.Internal(inspect(x))` to check. – Matt Dowle Oct 08 '12 at 16:00
9

To help me figure out whether or not modifying a list makes a deep copy or a shallow copy, I set up a small experiment. If modifying a list makes a deep copy, then it should be slower when you're modifying a list that contains a large object compared to a list that contains a small object:

z1 <- list(runif(1e7))
z2 <- list(1:10)

system.time({
  for(i in 1:1e4) z1[1 + i] <- 1L
})
#  user  system elapsed
# 0.283   0.034   0.317
system.time({
  for(i in 1:1e4) z2[1 + i] <- 1L
})
#  user  system elapsed
# 0.284   0.034   0.319

The timings on my computer were basically identical, suggesting that copying a list makes a shallow copy, copying pointers to existing data structures.

hadley
  • 102,019
  • 32
  • 183
  • 245
  • 8
    `.Internal(inspect(x))` is a more concrete way to tell. Looking to see if the hex address of the long vector has changed. – Matt Dowle Oct 08 '12 at 15:55
5

Accepted flodel's answer, but Chase's tip was good so I confirmed that I have the desired behavior using his suggestion of using tracemem(). Here is the first example, where we just append to the list:

x = list(10)
tracemem(x[[1]])
# [1] "<0x2d03fa8>" #(likely different on each machine)
x[[2]] = 20
tracemem(x[[1]])
# [1] "<0x2d03fa8>"

And here is the result from the second example, where we create two lists:

x = list(10)
tracemem(x[[1]])
# [1] "<0x2d03c78>"
x = list(10, 20)
tracemem(x[[1]])
# [1] "<0x2d07ff8>"

So the first method appears to give the desired behaviour.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
guy
  • 428
  • 4
  • 16