19

I am curious how an object of type list is implemented. Is it

  1. a dynamic vector that will automatically increase its size when it is full.
  2. a linked list where appending an item is O(1), but accessing an item is O(n).
  3. a tree structure with O(log(n)) item access.
  4. a hashtable with O(1) item access.

I am curious because lists can have key-value pairs that make them look like hash tables, but the elements are in order, which looks like a vector.

Edit: because length(list(runif(1e4))) is 1, so when append element to a list, it looks like that it copy the whole list every time, that makes it very slow:

But the access speed is much slower than a vector:

z1 <- runif(1e4)
system.time({
  for(i in 1:10000) z1[[1 + i]] <- 1
})

outputs:

user  system elapsed 
0.060   0.000   0.062 

but:

z1 <- list(runif(1e4))
system.time({
  for(i in 1:10000) z1[[1 + i]] <- 1
})

outputs:

user  system elapsed 
1.31    0.00    1.31 

init a list with 10000 elements:

z1 <- as.list(runif(1e4))
system.time({
  for(i in 1:10000) z1[[1 + i]] <- 1
})

outputs:

user  system elapsed 
0.060   0.000   0.065 

For the key & value access:

z1 <- list()
for(i in 1:10000){key <- as.character(i); z1[[key]] <- i} 
system.time({
  for(i in 1:10000) x <- z1[["1"]]
})
system.time({
  for(i in 1:10000) x <- z1[["10000"]]
})

The output is:

user  system elapsed 
0.01    0.00    0.01 
user  system elapsed 
1.78    0.00    1.78 

It's not an O(1) access, so it's not a hash table. My conclusion is that it's not a dynamic array since appending items will cause memory accesses every time; it's not a hashtable since access by key is not O(1).

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
HYRY
  • 94,853
  • 25
  • 187
  • 187
  • 3
    I would have voted you twice if you had put your analysis in an answer (where it should be IMHO.) – flodel Apr 21 '13 at 11:51

1 Answers1

15

Lists are essentially just arrays of R objects (SEXP). Resizing causes copies of the whole data and name lookup is linear.

Alternatively, you can use environments, which use hash tables internally.

Romain Francois
  • 17,432
  • 3
  • 51
  • 77
  • 1
    So if a `list` is an array of `SEXP`s, why is lookup linear and not constant? Or is *name* lookup linear and *indexed* lookup constant? –  Nov 06 '17 at 09:36
  • 1
    I did write name lookup. – Romain Francois Nov 06 '17 at 12:41
  • I can't find a case of performance drop for growing a list in R vs lapply? Example code system.time({ l = list(); for(i in 1:50000) l[[i]] = letters}) Is it for most practical concerns not a problem? Instead of letters it could also be some random numbers, generated with rnorm. – Soren Havelund Welling Apr 04 '20 at 11:05