Why there appear three NA's at the end?

Question

I was trying to write an interleave() function for two vectors of arbitrary length.

For equal-length vectors, I found in the internet:

.interleave <- function(vec1, vec2) {
  # cuts away longer
  res <- rbind(vec1, vec2)
  attributes(res) <- NULL
  res
}
# c(rbind(vec1, vec2)) is shorter code, but 
# is 3x slower according to the blog in the link

So for arbitrary length, I thought, I measure the lengths first and attach the rest of the longer vector.

interleave <- function(vec1, vec2) {
  vec1_len <- length(vec1)
  vec2_len <- length(vec2)
  min_len  <- min(vec1_len, vec2_len)
  if (vec1_len == vec2_len) {
    .interleave(vec1, vec2)
  } else {
    c(.interleave(vec1[1:min_len], vec2[1:min_len]),
      if (vec1_len > vec2_len) { 
        vec1[min_len+1:vec1_len]
      } else {
        vec2[min_len+1:vec2_len]
      })  
  }
} # strangely 3 NA's at end if unequal length

But now comes the strange thing:

interleave(c(1, 2, 3), c(4, 5, 6, 7, 8, 9))
## [1]  1  4  2  5  3  6  7  8  9 NA NA NA

interleave(c(1, 2, 3), c(4, 5, 6))
## [1] 1 4 2 5 3 6 

interleave(c(1, 2, 3), c(4, 5))
## [1]  1  4  2  5  3 NA NA

interleave(c(1, 2, 3), c(4, 5, 6, 7, 8, 9, 10, 11))
## [1]  1  4  2  5  3  6  7  8  9 10 11 NA NA NA
interleave(c(1, 2, 3, 4, 5, 6), c( 7, 8, 9, 10, 11))
## [1]  1  7  2  8  3  9  4 10  5 11  6 NA NA NA NA NA

From where do the NAs come from? Remark: I see the pattern that the number of attached NAs is the number of elements in the shorter vector ...

How to generate a version without NAs?

Solution

Sorry, I found it out myself. Problem was the subsetting of the rest-vector. I forgot some parantheses.

interleave <- function(vec1, vec2) {
  vec1_len <- length(vec1)
  vec2_len <- length(vec2)
  min_len  <- min(vec1_len, vec2_len)
  if (vec1_len == vec2_len) {
    .interleave(vec1, vec2)
  } else {
    c(.interleave(vec1[1:min_len], vec2[1:min_len]),
      if (vec1_len > vec2_len) { 
        vec1[(min_len+1):vec1_len] # parantheses!
      } else {
        vec2[(min_len+1):vec2_len] # parantheses!
      })  
  }
} # no NA's any more!

Slightly shorter

interleave <- function(vec1, vec2) {
  vec1_len <- length(vec1)
  vec2_len <- length(vec2)
  min_len  <- min(vec1_len, vec2_len)
  if (vec1_len == vec2_len) {
    .interleave(vec1, vec2)
  } else {
    c(.interleave(vec1[1:min_len], vec2[1:min_len]),
      if (vec1_len > vec2_len) {
        vec1[(min_len+1):vec1_len]
      } else {
        vec2[(min_len+1):vec2_len]
      })
  }
}

score 0 · Answer 1 · answered Jul 31 '19 at 09:45

0

A general function:

interleave <- function(...) {
    l <- list(...)
    len <- max(lengths(l))
    l <- lapply(l, function(x) `length<-`(x, len))
    res <- na.omit(c(do.call(rbind, l)))
    attributes(res) <- NULL
    res
}

interleave(1:9, seq(10,40,10), seq(100,500,100))
#[1]   1  10 100   2  20 200   3  30 300   4  40 400   5 500   6   7   8   9

and also generalizing Interleave lists in R

interleave <- function(...) {
    l <- list(...)
    idx <- order(unlist(lapply(l, function(x) seq_along(x))))
    unlist(l)[idx]
}

answered Jul 31 '19 at 09:45

chinsoon12

25,005
4
25
35

thanks. This kind I did, too. But I am trying to avoid lists. Because lists in R are very slow. My `interleave()` is just a helper function for a `strsplit()` equivalent I want to write, because `strsplit()` uses lists inside and therefore is terribly slow. And I am trying to write a method to split strings using mainly the `substr()` and `gregexpr()`. Because this is muuch faster. I know that such function exists in tidyverse already. But I wanted to write my own for learning purposes. – Gwang-Jin Kim Jul 31 '19 at 11:05

Why there appear three NA's at the end?

1 Answers1