0

https://www.programmingr.com/fast-r-append-list/

I don't find the time complexity of the appending operation on a vector or a list.

Is this documented anywhere?

Suppose that I have n elements, that I have to append one by one. Is there a data structure that has a time complexity of n to append the n elements?

user1424739
  • 11,937
  • 17
  • 63
  • 152

1 Answers1

2

Here is a function to test the running times of three different ways of assigning a value to a vector.
The first two extend the vector by appending the new element to its end. The last one allocates space by creating a vector with n elements and then populates the vector with the new elements.
Times are measured with package microbenchmark and medians are plotted with ggplot2 graphics.

Atomic vector timings

fun <- function(N = 1:10, times = 100){
  if(!require(microbenchmark)){
    stop("Package 'microbenchmark' is not installed")
  }
  out <- lapply(2^N, function(n){
    mb <- microbenchmark(
      c = {
        x <- NULL
        for(i in seq_len(n)) x <- c(x, i)
      },
      append = {
        x <- NULL
        for(i in seq_len(n)) x <- append(x, i)
      },
      allocate = {
        x <- numeric(n)
        for(i in seq_len(n)) x[i] <- i
      },
      times = times
    )
    mb$n <- n
    aggregate(time ~ expr + n, mb, median)
  })
  out <- do.call(rbind, out)
  row.names(out) <- NULL
  out
}

library(ggplot2)

times <- fun(1:12)

ggplot(times, aes(n, time, color = expr)) +
  geom_line() +
  geom_point() +
  theme_bw()

enter image description here


list timings

This function is nearly identical to the function above, but creating and extending objects of class "list".

Surprisingly, now to extend the list by assigning new elements after the current last is faster than append.

funList <- function(N = 1:10, times = 100){
  if(!require(microbenchmark)){
    stop("Package 'microbenchmark' is not installed")
  }
  out <- lapply(2^N, function(n){
    mb <- microbenchmark(
      List = {
        x <- list()
        for(i in seq_len(n)) x[[i]] <- i
      },
      append = {
        x <- list()
        for(i in seq_len(n)) x <- append(x, i)
      },
      allocate = {
        x <- vector("list", n)
        for(i in seq_len(n)) x[[i]] <- i
      },
      times = times
    )
    mb$n <- n
    aggregate(time ~ expr + n, mb, median)
  })
  out <- do.call(rbind, out)
  row.names(out) <- NULL
  out
}

library(ggplot2)

times <- funList(1:12)

ggplot(times, aes(n, time, color = expr)) +
  geom_line() +
  geom_point() +
  theme_bw()

enter image description here

S K
  • 3
  • 2
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Could you add the info for the list as well? – user1424739 Aug 05 '21 at 15:59
  • @user1424739 See also [this SO post](https://stackoverflow.com/questions/2436688/append-an-object-to-a-list-in-r-in-amortized-constant-time-o1?rq=1). – Rui Barradas Aug 05 '21 at 16:55
  • Do I understand that post correctly? `a <- list(a, list(i))` is the fastest for list? But it does not make sense as it involves rewriting the whole list. Intuitively, assigning by index should be the fastest. Why rewriting the original list could be faster? – user1424739 Aug 05 '21 at 17:00
  • @user1424739 The post is right, that way is the fastest, it's half the time of `append` but the result is not the same, it is a nested lists list, see [this comment](https://stackoverflow.com/questions/2436688/append-an-object-to-a-list-in-r-in-amortized-constant-time-o1?rq=1#comment57695097_29870770) to Dirk's answer.. – Rui Barradas Aug 05 '21 at 17:03
  • @user1424739 To get an `identical()` result, it would have to be `a <- c(a, list(i))` and performance degrades, as seen in my answer. – Rui Barradas Aug 05 '21 at 17:09