52

I am not happy with the accepted answer to Append an object to a list in R in amortized constant time?

> list1 <- list("foo", pi)
> bar <- list("A", "B")

How can I append new element bar to list1? Clearly, c() does not work, it flattens bar:

> c(list1, bar)
[[1]]
[1] "foo"

[[2]]
[1] 3.141593

[[3]]
[1] "A"

[[4]]
[1] "B"

Assignment to index works:

> list1[[length(list1)+1]] <- bar
> list1
[[1]]
[1] "foo"

[[2]]
[1] 3.141593

[[3]]
[[3]][[1]]
[1] "A"

[[3]][[2]]
[1] "B"

What is the efficiency of this method? Is there a more elegant way?

Community
  • 1
  • 1
user443854
  • 7,096
  • 13
  • 48
  • 63
  • 14
    `c(list1,list(bar))`? Please use package microbenchmark to benchmark this yourself. – Roland Jun 11 '13 at 14:28
  • Do you prefer performance, elegance or a tradeoff of both? Is all your data known to be string, or could be arbitrary? Please clarify the question title and text accordingly. – smci Apr 29 '16 at 11:12
  • Possible duplicate of [Append an object to a list in R in amortized constant time, O(1)?](https://stackoverflow.com/questions/2436688/append-an-object-to-a-list-in-r-in-amortized-constant-time-o1) – Borealis Jul 03 '17 at 02:01
  • Can someone tell why c() did not work and flattened the values? – Naveen Gabriel Aug 29 '18 at 07:33
  • @NaveenGabriel: because semantically `c()` is overloaded. It builds vectors from elements, and also by concatenating vectors. – user443854 Aug 29 '18 at 22:54
  • Would you be able to explain following behaviour? While appending the list c() flattens out but not while creating the list. a<-list(c("hello, hi"),2) > a [[1]] [1] "hello, hi" [[2]] [1] 2 append(a,c("what","see")) [[1]] [1] "hello, hi" [[2]] [1] 2 [[3]] [1] "what" [[4]] [1] "see" – Naveen Gabriel Aug 30 '18 at 05:49
  • @NaveenGabriel It has nothing to do with `c()`, you are observing different behavior between `list()` and `append()`. Can you elaborate why you find it surprising? – user443854 Aug 30 '18 at 14:19

3 Answers3

54

Adding elements to a list is very slow when doing it one element at a time. See these two examples:

I'm keeping the Result variable in the global environment to avoid copies to evaluation frames and telling R where to look for it with .GlobalEnv$, to avoid a blind search with <<-:

Result <- list()

AddItemNaive <- function(item)
{
    .GlobalEnv$Result[[length(.GlobalEnv$Result)+1]] <- item
}

system.time(for(i in seq_len(2e4)) AddItemNaive(i))
#   user  system elapsed 
#  15.60    0.00   15.61 

Slow. Now let's try the second approach:

Result <- list()

AddItemNaive2 <- function(item)
{
    .GlobalEnv$Result <- c(.GlobalEnv$Result, item)
}

system.time(for(i in seq_len(2e4)) AddItemNaive2(i))
#   user  system elapsed 
#  13.85    0.00   13.89

Still slow.

Now let's try using an environment, and creating new variables within this environment instead of adding elements to a list. The issue here is that variables must be named, so I'll use the counter as a string to name each item "slot":

Counter <- 0
Result <- new.env()

AddItemEnvir <- function(item)
{
    .GlobalEnv$Counter <- .GlobalEnv$Counter + 1

    .GlobalEnv$Result[[as.character(.GlobalEnv$Counter)]] <- item
}

system.time(for(i in seq_len(2e4)) AddItemEnvir(i))
#   user  system elapsed 
#   0.36    0.00    0.38 

Whoa much faster. :-) It may be a little awkward to work with, but it works.

A final approach uses a list, but instead of augmenting its size one element at a time, it doubles the size each time the list is full. The list size is also kept in a dedicated variable, to avoid any slowdown using length:

Counter <- 0
Result <- list(NULL)
Size <- 1

AddItemDoubling <- function(item)
{
    if( .GlobalEnv$Counter == .GlobalEnv$Size )
    {
        length(.GlobalEnv$Result) <- .GlobalEnv$Size <- .GlobalEnv$Size * 2
    }

    .GlobalEnv$Counter <- .GlobalEnv$Counter + 1

    .GlobalEnv$Result[[.GlobalEnv$Counter]] <- item
}

system.time(for(i in seq_len(2e4)) AddItemDoubling(i))
#   user  system elapsed 
#   0.22    0.00    0.22

It's even faster. And as easy to a work as any list.

Let's try these last two solutions with more iterations:

Counter <- 0
Result <- new.env()

system.time(for(i in seq_len(1e5)) AddItemEnvir(i))
#   user  system elapsed 
#  27.72    0.06   27.83 


Counter <- 0
Result <- list(NULL)
Size <- 1

system.time(for(i in seq_len(1e5)) AddItemDoubling(i))
#   user  system elapsed 
#   9.26    0.00    9.32

Well, the last one is definetely the way to go.

Ferdinand.kraft
  • 12,579
  • 10
  • 47
  • 69
  • 1
    I tried the first approach with `[[]]` with different number of elements: `2e3` runs 100x faster than `2e4`, clearly O(N^2), so the whole list is being copied. On the other hand, assigning to different variable name each time takes about 20x time for `2e5` elements compared to `2e4` elements, which is O(N) -- performance that what I would expect from adding an element to a list. – user443854 Jun 11 '13 at 17:39
  • Removing `.GlobalEnv$` makes the function faster – Megatron Oct 12 '15 at 14:00
  • @Megatron, that breaks the code, see my comment to your answer. – Ferdinand.kraft Dec 22 '15 at 23:54
  • I like the doubling size method, but I always end up with a list longer than what I want. How do i remove the NULL entries at the tail of the list: – masfenix Jun 21 '16 at 14:24
23

It's very easy. You just need to add it in the following way :

list1$bar <- bar
PAC
  • 5,178
  • 8
  • 38
  • 62
6

Operations that change the length of a list/vector in R always copy all the elements into a new list, and so will be slow, O(n). Storing in an environment is O(1) but has a higher constant overhead. For an actual O(1) append and benchmark comparison of a number of approaches see my answer to the other question at https://stackoverflow.com/a/32870310/264177.

Community
  • 1
  • 1
JanKanis
  • 6,346
  • 5
  • 38
  • 42
  • 1
    And by O(1) for storing an item in an environment, I mean amortized O(1) (i.e. on average), but for most cases that is good enough. – JanKanis Mar 24 '16 at 14:12
  • R using a functional programming style will always have problems when adding an element to an existing structure, like in `v <- c(v, newElement)`. Adding or replacing elements via index is not functional programming style, however. I don't know the internal representation of vectors in R, but using a _chunked array_ (individual smaller arrays being linked in a chain to form a larger array basically), parts not affected by the append could be reused unmodified, removing load from the GC. After all, the base of R is not designed very well, making it quite hard to do simple things, sometimes. – U. Windl Jan 25 '18 at 09:40
  • R vectors are basically plain fixed size C arrays. If something changes the length R just creates a new copy of the entire array. That works great for doing analyses on large data sets but not so well for appending elements one by one. – JanKanis Oct 22 '18 at 14:26