36

I have several vectors of unequal length and I would like to cbind them. I've put the vectors into a list and I have tried to combine the using do.call(cbind, ...):

nm <- list(1:8, 3:8, 1:5)
do.call(cbind, nm)

#      [,1] [,2] [,3]
# [1,]    1    3    1
# [2,]    2    4    2
# [3,]    3    5    3
# [4,]    4    6    4
# [5,]    5    7    5
# [6,]    6    8    1
# [7,]    7    3    2
# [8,]    8    4    3
# Warning message:
#   In (function (..., deparse.level = 1)  :
#         number of rows of result is not a multiple of vector length (arg 2)

As expected, the number of rows in the resulting matrix is the length of the longest vector, and the values of the shorter vectors are recycled to make up for the length.

Instead I'd like to pad the shorter vectors with NA values to obtain the same length as the longest vector. I'd like the matrix to look like this:

#      [,1] [,2] [,3]
# [1,]    1    3    1
# [2,]    2    4    2
# [3,]    3    5    3
# [4,]    4    6    4
# [5,]    5    7    5
# [6,]    6    8    NA
# [7,]    7    NA   NA
# [8,]    8    NA   NA

How can I go about doing this?

Henrik
  • 65,555
  • 14
  • 143
  • 159
Nick
  • 655
  • 3
  • 7
  • 9

6 Answers6

34

You can use indexing, if you index a number beyond the size of the object it returns NA. This works for any arbitrary number of rows defined with foo:

nm <- list(1:8,3:8,1:5)

foo <- 8

sapply(nm, '[', 1:foo)

EDIT:

Or in one line using the largest vector as number of rows:

sapply(nm, '[', seq(max(sapply(nm,length))))

From R 3.2.0 you may use lengths ("get the length of each element of a list") instead of sapply(nm, length):

sapply(nm, '[', seq(max(lengths(nm))))
Henrik
  • 65,555
  • 14
  • 143
  • 159
Sacha Epskamp
  • 46,463
  • 20
  • 113
  • 131
  • `'['` is the name of the operator `[` which you use in indexing (`foo[1:10]`). See also `?'['` – Sacha Epskamp Dec 01 '11 at 10:09
  • The one line solution fails if the first column is shorter than the other two. – bshor Jul 17 '12 at 19:35
  • The only answer that keeps column name is from @Ronak Shah using the `rowr` package. Is there an alternative with base R that keeps column names? – SeGa Apr 25 '19 at 08:31
7

You should fill vectors with NA before calling do.call.

nm <- list(1:8,3:8,1:5)

max_length <- max(unlist(lapply(nm,length)))
nm_filled <- lapply(nm,function(x) {ans <- rep(NA,length=max_length);
                                    ans[1:length(x)]<- x;
                                    return(ans)})
do.call(cbind,nm_filled)
Wojciech Sobala
  • 7,431
  • 2
  • 21
  • 27
3

Here is an option using stri_list2matrix from stringi

library(stringi)
out <- stri_list2matrix(nm)
class(out) <- 'numeric'
out
#      [,1] [,2] [,3]
#[1,]    1    3    1
#[2,]    2    4    2
#[3,]    3    5    3
#[4,]    4    6    4
#[5,]    5    7    5
#[6,]    6    8   NA
#[7,]    7   NA   NA
#[8,]    8   NA   NA
akrun
  • 874,273
  • 37
  • 540
  • 662
3

This is a shorter version of Wojciech's solution.

nm <- list(1:8,3:8,1:5)
max_length <- max(sapply(nm,length))
sapply(nm, function(x){
    c(x, rep(NA, max_length - length(x)))
})
Thierry
  • 18,049
  • 5
  • 48
  • 66
  • 2
    You are always better off using `vapply` rather than `sapply` because that will guarantee you get the output type that you expect. – hadley Apr 04 '11 at 12:09
  • @hadley Could you elaborate on your comment? I don't understand the difference between vapply and sapply transferred to this problem. – guerda Feb 18 '15 at 14:23
  • 1
    sapply is dangerous to program with because it is not type stable - depending on the length of `nm` you'll get different types – hadley Feb 19 '15 at 16:22
2

Late to the party but you could use cbind.fill from rowr package with fill = NA

library(rowr)
do.call(cbind.fill, c(nm, fill = NA))

#  object object object
#1      1      3      1
#2      2      4      2
#3      3      5      3
#4      4      6      4
#5      5      7      5
#6      6      8     NA
#7      7     NA     NA
#8      8     NA     NA

If you have a named list instead and want to maintain the headers you could use setNames

nm <- list(a = 1:8, b = 3:8, c = 1:5)
setNames(do.call(cbind.fill, c(nm, fill = NA)), names(nm))

#  a  b  c
#1 1  3  1
#2 2  4  2
#3 3  5  3
#4 4  6  4
#5 5  7  5
#6 6  8 NA
#7 7 NA NA
#8 8 NA NA
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

You have to bring all list elements to the same length using length<- and then you can use cbind to get a matrix.

nm <- list(1:8, 3:8, 1:5)

do.call(cbind, lapply(nm, `length<-`, max(lengths(nm))))
#     [,1] [,2] [,3]
#[1,]    1    3    1
#[2,]    2    4    2
#[3,]    3    5    3
#[4,]    4    6    4
#[5,]    5    7    5
#[6,]    6    8   NA
#[7,]    7   NA   NA
#[8,]    8   NA   NA

Benchmark

nm <- list(1:8, 3:8, 1:5)

bench::mark(
"[" = sapply(nm, '[', seq(max(lengths(nm)))),
"length<-" = do.call(cbind, lapply(nm, `length<-`, max(lengths(nm)))) )
#  express…¹     min  median itr/s…² mem_a…³ gc/se…⁴ n_itr  n_gc total…⁵ result  
#  <bch:exp> <bch:t> <bch:t>   <dbl> <bch:b>   <dbl> <int> <dbl> <bch:t> <list>  
#1 [         36.19µs 40.56µs  24412.      0B    12.2  9995     5 409.4ms <int[…]>
#2 length<-   8.63µs  9.88µs 100367.      0B    20.1  9998     2  99.6ms <int[…]>

Using length<- is in this case about 4 times faster than [.

GKi
  • 37,245
  • 2
  • 26
  • 48