0

I was under the impression that any call to a list by the index was a constant runtime (i.e. O(1) ), but it seems that this is not actually the case, as shown below. What causes this increase in runtime (looks like O(n) ), and more importantly, is there a way around it? I would like to have a list of data.frame's that has the constant runtime access of a hash set (and there will be no changes to the "key set" so a "perfect" hash is possible).

> l <- list()
> system.time(l[[300000000]] <- "test")
   user  system elapsed 
   1.86    0.36    2.21 
> system.time(l[[299999999]] <- "test")
   user  system elapsed 
   4.42    0.73    5.15 
> system.time(l[[1]] <- "test")
   user  system elapsed 
   4.34    0.44    4.77 
> l <- list()
> system.time(l[[300000]] <- "test")
   user  system elapsed 
      0       0       0 
> l <- list()
> system.time(l[[3000000]] <- "test")
   user  system elapsed 
   0.01    0.00    0.01 
> l <- list()
> system.time(l[[30000000]] <- "test")
   user  system elapsed 
   0.32    0.03    0.36 
jpd527
  • 1,543
  • 1
  • 14
  • 30
  • FWIW, on a 4 year old iMac (!) with only 4gb of ram, I get times of 0.335, 0.522 and 0.445, so I would argue that there is something specific about your machines that's causing this. However, if you're looking for O(1) name lookup, use [environments](http://stackoverflow.com/q/16129068/324364). – joran Nov 26 '13 at 02:53

1 Answers1

1

The time you measured is spent building the list, not accessing it. See e.g. what the result here is:

l <- list()
l[[3]] <- "test"
l
#[[1]]
#NULL
#
#[[2]]
#NULL
#
#[[3]]
#[1] "test"

So this takes some time:

l = list()
system.time(l[[1e7]] <- "test")
#user  system elapsed 
#0.06    0.00    0.07 

But this doesn't:

l = list()
l[[1e7]] = "boo"    # this will build the list
system.time(l[[1e7]] <- "test")   # now just access
#user  system elapsed 
#   0       0       0 

Perhaps for your purposes you might want to have character keys instead - l[["402023402"]] = "some value".

eddi
  • 49,088
  • 6
  • 104
  • 155
  • That's what I thought at first too, but then I found out otherwise with this: – jpd527 Nov 25 '13 at 22:41
  • > l <- list() > system.time(l[[300000000]] <- "test") user system elapsed 1.86 0.36 2.21 > system.time(l[[299999999]] <- "test") user system elapsed 4.42 0.73 5.15 > system.time(l[[1]] <- "test") user system elapsed 4.34 0.44 4.77 – jpd527 Nov 25 '13 at 22:41
  • Sorry, that didn't come out too well, but it's the first bit of the code block above, where I access a lower index after building it and the access time is still terrible, worse actually, which is odd. – jpd527 Nov 25 '13 at 22:43
  • running those commands in that sequence gives 0 for the last two timings for me - try again in a clean session perhaps? – eddi Nov 25 '13 at 22:44
  • Odd, I cleaned the session and got the times of 0.33, 3.39 and 5.04. Maybe something else is going on? – jpd527 Nov 25 '13 at 22:46
  • 1
    Seeing this on multiple systems. Try a higher magnitude number perhaps? Maybe you have greater system resources. – jpd527 Nov 25 '13 at 22:47
  • good point, that could be the explanation - it's probably more than your physical memory and the OS is having to write it to disk in which case access time = "who knows" – eddi Nov 25 '13 at 22:49
  • 1
    btw as I mentioned at the end of the answer - don't make a list that's larger than your needs - use character keys – eddi Nov 25 '13 at 22:51