2

I'm trying to find most quick and effective way for appending to a vector. I tested 5 different appending methods and found that most quick method is just using preallocation (see append3 function):

# using generic combine
append1 <- function(n) {
    v <- numeric()
    for (i in 1:n) {
        v <- c(v, i)
    }
    v
}

# using length and auto extending
append2 <- function(n) {
    v <- numeric()
    for (i in 1:n) {
        v[length(v) + 1] <- i
    }
    v
}

# using preallocation
append3 <- function(n) {
    v <- numeric(n)
    for (i in 1:n) {
        v[i] <- i
    }
    v
}

# using append
append4 <- function(n) {
    v <- numeric()
    for (i in 1:n) {
        v <- append(v, i)
    }
    v
}

# using union
append5 <- function(n) {
    v <- numeric()
    for (i in 1:n) {
        v <- union(v, i)
    }
    v
}

library(microbenchmark)
microbenchmark(append1(10000), append2(10000), append3(10000), append4(10000), append5(10000), times = 5)

# Unit: milliseconds
# expr        min         lq     median         uq        max neval
# append1(10000)  338.77588  341.06664  342.66819  360.11682  413.78760     5
# append2(10000)  372.71939  373.20159  375.15096  385.76314  431.04495     5
# append3(10000)   23.26534   23.27922   23.59688   23.68247   24.80935     5
# append4(10000)  373.60041  373.91250  434.95227  435.57716  440.97028     5
# append5(10000) 6382.45524 6425.84974 6445.28719 6520.39599 6572.08553     5

But preallocation requires to know vector's initial capacity. I want to know is there another quick method that appends dynamically, without preallocation.

Eldar Agalarov
  • 4,849
  • 4
  • 30
  • 38
  • Pre-allocation will basically always be faster because you can grab the chunk of memory that you need in advance as opposed to jumping around. Can you pre-allocate more than you need, then chop it off at the end when you know how many runs you did? – Ari B. Friedman Jun 02 '14 at 00:39
  • 3
    You can preallocate even if you don't know the exact length of the result. Allocate something reasonable given the problem (can you be in the right order magnitude?), then keep track in the loop of how many elements you have appended. once you hit the preallocated limit, allocate a larger vector (say 2x the original), copy over the current state of the vector to the new object, continue with loop. Repeat as needed until you reach the end of the problem. If you over allocate at any point, remove the unused elements once the loop is complete. – Gavin Simpson Jun 02 '14 at 00:40
  • http://stackoverflow.com/questions/8219613/dynamic-array-like-structure-in-r/8221780#8221780 – Ben Bolker Jun 02 '14 at 00:41
  • Thanks for advices. I posted testing results below. – Eldar Agalarov Jun 02 '14 at 02:51

1 Answers1

0

I tested method with lazy preallocation and found that optimal preallocation size is approximately 5% of vector's length:

# using preallocation if necessary
append6 <- function(n, m, k) {
    v <- numeric(m)
    len <- m
    seqN <- 1:n
    for (i in seqN) {        
        if (i > len) {
            newLen <- len * k
            v <- c(v, numeric(ifelse(newLen >= 1, newLen, 1)))
            len <- length(v)
        }
        v[i] <- i
    }
    v[seqN]
}

microbenchmark(append3(10000), append6(10000, 1, 0.01), append6(10000, 1, 0.02), append6(10000, 1, 0.03), append6(10000, 1, 0.05), times = 5)
#Unit: milliseconds
#                   expr      min       lq   median       uq      max neval
#         append3(10000) 22.63332 23.33605 23.84207 24.79069 25.14729     5
#append6(10000, 1, 0.01) 47.97180 48.02801 48.55201 56.94827 57.34071     5
#append6(10000, 1, 0.02) 39.13763 39.23551 41.39761 41.48553 42.57642     5
#append6(10000, 1, 0.03) 36.50014 37.07902 40.95009 44.17062 46.30302     5
#append6(10000, 1, 0.05) 34.67124 34.88921 35.01459 36.77056 43.55909     5

microbenchmark(append3(100000), append6(100000, 1, 0.01), append6(100000, 1, 0.02), append6(100000, 1, 0.03), append6(100000, 1, 0.05), times = 5)
#Unit: milliseconds
#                   expr      min       lq   median       uq      max neval
#         append3(1e+05) 241.0875 241.1899 241.9886 250.7469 257.6539     5
#append6(1e+05, 1, 0.01) 439.9242 496.1509 500.2590 501.2341 505.9891     5
#append6(1e+05, 1, 0.02) 381.0249 382.8471 383.3789 391.5930 460.3711     5
#append6(1e+05, 1, 0.03) 357.0127 359.9495 361.5947 371.0794 383.7376     5
#append6(1e+05, 1, 0.05) 345.7090 347.6097 349.7010 371.0359 377.4635     5

It's seen that append6 method is ~45% slower than append3 method. Seems this is payment for balance between speed and memory usage.

Eldar Agalarov
  • 4,849
  • 4
  • 30
  • 38
  • 1
    I don't follow the argument that 5% length is optimal; I would expect 100% to be optimal so you don't have to keep track of anything. – Gavin Simpson Jun 02 '14 at 03:56