9

I understand it is useful to preallocate a vector or a matrix, because they are always stored in a contiguous memory block.

However, in terms of a list, it can contains elements of difference length and modes. So my first guess is that a list might merely contains pointers to the true address of its elements. Am I correct? A related question here What is the internal implementation of lists? It says the list is essentially an array, but it does't cover how the elements are stored in a list while size of each element may change.

Example 1: If a list contains a,b,c,d,e, and when myList$a<-1:1000000, is the list modified in place (which means only a is updated) or the whole list are copied and updated?

Example 2

> system.time( { myList <- list()
+                myList$a <- 1:10000000
+                myList$b <- 1:10000100
+                myList$c <- 1:10000200 
+                myList$d <- 1:10000300})
   user  system elapsed 
   0.01    0.02    0.03

> system.time({ myList2<-list(a=1:10000000,b=1:10000100,c=1:10000200,d=1:10000300) })
   user  system elapsed 
   0.00    0.03    0.03 

would myList be very inefficient compared to myList2 due to no preallocation? Or no noticeable performance difference at all no matter how large a,b,c,d are because the first one copies only pointers a few times more?

Here comes to preallocation. How does it look like for a list? Does it only preallocate the memory for the pointers? If that's the case, I don't see any use since there won't be much data copying for pointers anyway.

> system.time( { myList <- vector("list", 4)
+                myList[[1]] <- 1:10000000
+                myList[[2]] <- 1:10000100
+                myList[[3]]  <- 1:10000200 
+                myList[[4]] <- 1:10000300
+                names(myList) <- letters[1:4]})
   user  system elapsed 
   0.01    0.02    0.03 
Community
  • 1
  • 1
colinfang
  • 20,909
  • 19
  • 90
  • 173
  • possible duplicate of [What's the internal implement of R list](http://stackoverflow.com/questions/16129068/whats-the-internal-implement-of-r-list) – Ferdinand.kraft Aug 25 '13 at 01:25
  • 2
    A side note to your question, if you do want something which stores pointers to the elements you should use environments rather than lists. – Scott Ritchie Aug 25 '13 at 12:32
  • 1
    This may be useful: http://stackoverflow.com/questions/17046336/here-we-go-again-append-an-element-to-a-list-in-r/17050160#17050160 – Ferdinand.kraft Aug 25 '13 at 14:29
  • The following reference provides details regarding memory profiling to see what is happening: [Memory usage: Advanced R](http://adv-r.had.co.nz/memory.html#modification). – Stan Jun 30 '17 at 20:01

2 Answers2

7
 system.time( { myList <- list()
   myList$a <- 1:100000
   myList$b <- 1:100001
   myList$c <- 1:100002 
   myList$d <- 1:100003})
#   user  system elapsed 
#  0.019   0.001   0.022 

 system.time({ myList<-list(a=1:100000,b=1:100001,c=1:100002,d=1:100003) })
#   user  system elapsed 
#  0.001   0.001   0.002 

Generally pre-allocation is advised, although what you did didn't look like what I think of as pre-allocation. This is what I would have given that description.

system.time( { myList <- vector("list", 4)
   myList[[1]] <- 1:100000
   myList[[2]] <- 1:100001
   myList[[3]]  <- 1:100002 
   myList[[4]] <- 1:100003
   names(myList) <- letters[1:4]})
#   user  system elapsed 
#  0.001   0.001   0.001 

BTW: I do not think that the mental model of a list being just a pointer is useful. Many times simple assignment creates an entirely new temporary copy which then gets reassigned that name.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • I wasn't able to reproduce on a different machine, and I think the example may be too small to demonstrate a real difference anyway. It's possible that R may effective preallocate some number of leaves and it is only when you exceed that number that the advantage appears. Timings in this case may be overwhelmed by other aspects of evaluation and assignment. I would think you might want to preallocate a 50 element lsit vector and comapre times to extending a list to reach 50 elements and see if the difference appears. – IRTFM Aug 25 '13 at 21:58
7

This is too long for a comment -- but not a complete answer.

Named lists are treated differently to unnamed lists in terms of copying when modified.

Copying (for large objects)

This is a complex issue. See http://radfordneal.wordpress.com/2013/07/02/fixing-rs-named-problems-in-pqr/ for a reasonable explanation and note that some of these changes are being implemented in the development version of R. Also see http://r.789695.n4.nabble.com/Confused-about-NAMED-td4103326.html for more understanding of the underlying complexities

Here are some timings for various possibilities of preallocation

# preallocated contents so timing is list related only
.a <- seq_len(1e6); .b <- seq_len((1e6 + 1))
.c <- seq_len((1e6 + 2)); .d <- seq_len((1e6 + 3))



f1 <- function(){
# using `$<-` empty list
  x <- list()
  x$a <- .a
  x$b <- .b
  x$c <- .c 
  x$d <- .d
  x
}


f2 <- function(){
  # using `[[<-` on empty list
  x <- list()
  x[['a']] <- .a
  x[['b']] <- .b
  x[['c']] <- .c
  x[['d']] <- .d
  x
}


f3 <- function(){
  # using `[[<-` on empty list, naming later
  x <- list()
  x[[1]] <- .a
  x[[2]] <- .b
  x[[3]] <- .c
  x[[4]] <- .d
  names(x) <- letters[1:4]
  x
}

f4 <- function(){ 
  # just define the list
  x <- list(a = .a, b = .b,
            c = .c, d = .d)
}


f5 <- function(){ 
  # create a list of length 4, then fill and name
  x <- vector(mode = 'list', length = 4)
  x[[1]] <- .a
  x[[2]] <- .b
  x[[3]] <- .c
  x[[4]] <- .d
  names(x) <- letters[1:4]
  x
}

# f6 <- function(){
#  # this doesn't work!
#  # it creates a list of length 8
#  x <- vector(mode = 'list', length = 4)
#  x[['a']] <- .a
#  x[['b']] <- .b
#  x[['c']] <- .c
#  x[['d']] <- .d
#  x
# }


f7 <-function(){
# pre allocate list, name then fill
x <- vector(mode = 'list', length = 4)
  names(x) <- letters[1:4]
  x[[1]] <- .a
  x[[2]] <- .b
  x[[3]] <- .c
  x[[4]] <- .d
  x
}

f8 <- function(){
# preallocate correct length and then name
# and fill by name
  x <- vector(mode = 'list', length = 4)
  names(x) <- letters[1:4]
  x[['a']] <- .a
  x[['b']] <- .b
  x[['c']] <- .c
  x[['d']] <- .d
  x
}

library(microbenchmark)
microbenchmark(f1(),f2(),f3(),f4(),f5(),f7(),f8(),times=100)


microbenchmark(f1(),f2(),f3(),f4(),f5(),f7(),f8(),times=100)
# Unit: microseconds
#   expr      min       lq   median       uq       max neval
#   f1()    6.038   11.169   12.980   14.791    34.110   100
#   f2() 2528.881 4387.962 4707.014 6472.823 74586.266   100
#   f3() 2532.805 4537.376 4714.258 5353.722 74903.206   100
#   f4() 2475.756 4531.489 4721.503 6331.860 74460.395   100
#   f5() 2508.959 4512.474 4759.535 6673.551  7966.668   100
#   f7() 2545.181 4477.761 4709.127 6372.610  7964.856   100
#   f8() 2508.053 4467.799 4669.131 6181.993 74236.726   100
#
#  All results are identical.
all(identical(f1(),f2()),identical(f1(),f3()),
    identical(f1(),f4()),identical(f1(),f5()),
    identical(f1(),f7()),identical(f1(),f8()))

# [1] TRUE

All results are identical.

Clearly using $<-` on an empty list is the clear winner. This goes against my thinking of what should be quickest

mnel
  • 113,303
  • 27
  • 265
  • 254
  • Ok, I came from the world of C# and F#. But I have to use R at work. The more I learned about R, the more I hate R. – colinfang Aug 26 '13 at 01:04
  • 1
    R takes a lot of getting used to. I found that if I approach it more like it is a tool instead of a well designed language, I don't get as frustrated. – Matthew Crews Oct 20 '15 at 21:28
  • I reran this code in 2017 and the results are quite different with R 3.4.0 Fastest is now f4(), then f1,5,6 are the same, quiet fast but not as fast as f4, then f8, f3 with f2 being slowest – falcs Oct 18 '17 at 14:33