Here we go again: append an element to a list in R

Question

I am not happy with the accepted answer to Append an object to a list in R in amortized constant time?

> list1 <- list("foo", pi)
> bar <- list("A", "B")

How can I append new element bar to list1? Clearly, c() does not work, it flattens bar:

> c(list1, bar)
[[1]]
[1] "foo"

[[2]]
[1] 3.141593

[[3]]
[1] "A"

[[4]]
[1] "B"

Assignment to index works:

> list1[[length(list1)+1]] <- bar
> list1
[[1]]
[1] "foo"

[[2]]
[1] 3.141593

[[3]]
[[3]][[1]]
[1] "A"

[[3]][[2]]
[1] "B"

What is the efficiency of this method? Is there a more elegant way?

`c(list1,list(bar))`? Please use package microbenchmark to benchmark this yourself. — Roland, Jun 11 '13 at 14:28
Do you prefer performance, elegance or a tradeoff of both? Is all your data known to be string, or could be arbitrary? Please clarify the question title and text accordingly. — smci, Apr 29 '16 at 11:12
Possible duplicate of [Append an object to a list in R in amortized constant time, O(1)?](https://stackoverflow.com/questions/2436688/append-an-object-to-a-list-in-r-in-amortized-constant-time-o1) — Borealis, Jul 03 '17 at 02:01
Can someone tell why c() did not work and flattened the values? — Naveen Gabriel, Aug 29 '18 at 07:33
@NaveenGabriel: because semantically `c()` is overloaded. It builds vectors from elements, and also by concatenating vectors. — user443854, Aug 29 '18 at 22:54
Would you be able to explain following behaviour? While appending the list c() flattens out but not while creating the list. a<-list(c("hello, hi"),2) > a [[1]] [1] "hello, hi" [[2]] [1] 2 append(a,c("what","see")) [[1]] [1] "hello, hi" [[2]] [1] 2 [[3]] [1] "what" [[4]] [1] "see" — Naveen Gabriel, Aug 30 '18 at 05:49
@NaveenGabriel It has nothing to do with `c()`, you are observing different behavior between `list()` and `append()`. Can you elaborate why you find it surprising? — user443854, Aug 30 '18 at 14:19

score 54 · Accepted Answer · answered Jun 11 '13 at 17:26

Adding elements to a list is very slow when doing it one element at a time. See these two examples:

I'm keeping the Result variable in the global environment to avoid copies to evaluation frames and telling R where to look for it with .GlobalEnv$, to avoid a blind search with <<-:

Result <- list()

AddItemNaive <- function(item)
{
    .GlobalEnv$Result[[length(.GlobalEnv$Result)+1]] <- item
}

system.time(for(i in seq_len(2e4)) AddItemNaive(i))
#   user  system elapsed 
#  15.60    0.00   15.61

Slow. Now let's try the second approach:

Result <- list()

AddItemNaive2 <- function(item)
{
    .GlobalEnv$Result <- c(.GlobalEnv$Result, item)
}

system.time(for(i in seq_len(2e4)) AddItemNaive2(i))
#   user  system elapsed 
#  13.85    0.00   13.89

Still slow.

Now let's try using an environment, and creating new variables within this environment instead of adding elements to a list. The issue here is that variables must be named, so I'll use the counter as a string to name each item "slot":

Counter <- 0
Result <- new.env()

AddItemEnvir <- function(item)
{
    .GlobalEnv$Counter <- .GlobalEnv$Counter + 1

    .GlobalEnv$Result[[as.character(.GlobalEnv$Counter)]] <- item
}

system.time(for(i in seq_len(2e4)) AddItemEnvir(i))
#   user  system elapsed 
#   0.36    0.00    0.38

Whoa much faster. :-) It may be a little awkward to work with, but it works.

A final approach uses a list, but instead of augmenting its size one element at a time, it doubles the size each time the list is full. The list size is also kept in a dedicated variable, to avoid any slowdown using length:

Counter <- 0
Result <- list(NULL)
Size <- 1

AddItemDoubling <- function(item)
{
    if( .GlobalEnv$Counter == .GlobalEnv$Size )
    {
        length(.GlobalEnv$Result) <- .GlobalEnv$Size <- .GlobalEnv$Size * 2
    }

    .GlobalEnv$Counter <- .GlobalEnv$Counter + 1

    .GlobalEnv$Result[[.GlobalEnv$Counter]] <- item
}

system.time(for(i in seq_len(2e4)) AddItemDoubling(i))
#   user  system elapsed 
#   0.22    0.00    0.22

It's even faster. And as easy to a work as any list.

Let's try these last two solutions with more iterations:

Counter <- 0
Result <- new.env()

system.time(for(i in seq_len(1e5)) AddItemEnvir(i))
#   user  system elapsed 
#  27.72    0.06   27.83 


Counter <- 0
Result <- list(NULL)
Size <- 1

system.time(for(i in seq_len(1e5)) AddItemDoubling(i))
#   user  system elapsed 
#   9.26    0.00    9.32

Well, the last one is definetely the way to go.

I tried the first approach with `[[]]` with different number of elements: `2e3` runs 100x faster than `2e4`, clearly O(N^2), so the whole list is being copied. On the other hand, assigning to different variable name each time takes about 20x time for `2e5` elements compared to `2e4` elements, which is O(N) -- performance that what I would expect from adding an element to a list. — user443854, Jun 11 '13 at 17:39
@Megatron, that breaks the code, see my comment to your answer. — Ferdinand.kraft, Dec 22 '15 at 23:54
I like the doubling size method, but I always end up with a list longer than what I want. How do i remove the NULL entries at the tail of the list: — masfenix, Jun 21 '16 at 14:24

score 23 · Answer 2 · answered Jun 11 '13 at 14:35

23

It's very easy. You just need to add it in the following way :

list1$bar <- bar

answered Jun 11 '13 at 14:35

PAC

5,178
8
38
62

13

This does not address the question of efficiency at all. – Richard Sep 04 '15 at 03:20
4

Yeah but it's `very easy...` @Richard – d8aninja Apr 23 '17 at 02:50

score 6 · Answer 3 · edited May 23 '17 at 12:03

6

Operations that change the length of a list/vector in R always copy all the elements into a new list, and so will be slow, O(n). Storing in an environment is O(1) but has a higher constant overhead. For an actual O(1) append and benchmark comparison of a number of approaches see my answer to the other question at https://stackoverflow.com/a/32870310/264177.

edited May 23 '17 at 12:03

Community

1
1

answered Jan 25 '16 at 11:44

JanKanis

6,346
5
38
42

1

And by O(1) for storing an item in an environment, I mean amortized O(1) (i.e. on average), but for most cases that is good enough. – JanKanis Mar 24 '16 at 14:12
R using a functional programming style will always have problems when adding an element to an existing structure, like in `v <- c(v, newElement)`. Adding or replacing elements via index is not functional programming style, however. I don't know the internal representation of vectors in R, but using a _chunked array_ (individual smaller arrays being linked in a chain to form a larger array basically), parts not affected by the append could be reused unmodified, removing load from the GC. After all, the base of R is not designed very well, making it quite hard to do simple things, sometimes. – U. Windl Jan 25 '18 at 09:40
R vectors are basically plain fixed size C arrays. If something changes the length R just creates a new copy of the entire array. That works great for doing analyses on large data sets but not so well for appending elements one by one. – JanKanis Oct 22 '18 at 14:26

Here we go again: append an element to a list in R

3 Answers3

Linked