Combine matrices of different length and keep column names

Question

There is a similar question about combining vectors with different lengths here, but all answers (except @Ronak Shah`s answer) loose the names/colnames.

My problem is that I need to keep the column names, which seems to be possible using the rowr package and cbind.fills.

I would like to stay in base-R or use stringi and the output shoud remain a matrix.

Test data:

inp <- list(structure(c("1", "2"), .Dim = 2:1, .Dimnames = list(NULL,"D1")), 
            structure(c("3", "4", "5"), .Dim = c(3L, 1L), .Dimnames = list(NULL, "D2")))

I know that I could get the column names beforehand and then reassign them after creating the matrix, like:

## Using stringi
colnam <- unlist(lapply(inp, colnames))
out <- stri_list2matrix(inp)
colnames(out) <- colnam
out    

## Using base-R
colnam <- unlist(lapply(inp, colnames))
max_length <- max(lengths(inp))
nm_filled <- lapply(inp, function(x) {
  ans <- rep(NA, length = max_length)
  ans[1:length(x)]<- x
  ans
})
out <- do.call(cbind, nm_filled)
colnames(out) <- colnam
out

Are there other options that keep the column names?

In that question it is list of vectors but here you have got list of matrices. Is that intentional ? — Ronak Shah, Apr 25 '19 at 09:14
Yes I know, but it is still working with matrices aswell. Yes it could also have multiple columns, not just 2. — SeGa, Apr 25 '19 at 09:17
How would different columns be treated? For example, what would be your expected output for `inp` after changing `inp[[1]] <- cbind(inp[[1]], "D3" = 4)` ? — Ronak Shah, Apr 25 '19 at 09:34
It should be filled with NA. So this should result in `cbind(inp[[1]], "D3" = c(4,NA))` — SeGa, Apr 25 '19 at 09:36

score 2 · Answer 1 · answered Apr 25 '19 at 09:21

2

Since stringi is ok for you to use, you can use the function stri_list2matrix(), i.e.

setNames(as.data.frame(stringi::stri_list2matrix(inp)), sapply(inp, colnames))
#    D1 D2
#1    1  3
#2    2  4
#3 <NA>  5

answered Apr 25 '19 at 09:21

Sotos

51,121
6
32
66

Maurits Evers · Answer 2 · 2019-04-25T09:45:57.760

1

Here is a slightly more concise base R variation

len <- max(lengths(inp))
nms <- sapply(inp, colnames)
do.call(cbind, setNames(lapply(inp, function(x)
    replace(rep(NA, len), 1:length(x), x)), nms))
#      D1  D2
#[1,] "1" "3"
#[2,] "2" "4"
#[3,] NA  "5"

Not sure if this constitutes a sufficiently different solution from what you've already posted. Will remove if deemed too similar.

Update

Or how about a merge?

Reduce(
    function(x, y) merge(x, y, all = T, by = 0),
    lapply(inp, as.data.frame))[, -1]
#    D1 D2
#1    1  3
#2    2  4
#3 <NA>  5

The idea here is to convert the list entries to data.frames, ~~then add a row number and merge by row~~ and merge by row by setting by = 0 (thanks @Henrik). Note that this will return a data.frame rather than a matrix.

edited Apr 25 '19 at 09:45

answered Apr 25 '19 at 09:19

Maurits Evers

49,617
4
47
68

1

You can replace `sapply(inp, length)` with `lengths(inp)` – Sotos Apr 25 '19 at 09:24
No definitly worth an answer, as it is indeed way more compact than my base-R approach. Unfortunately it is also necessary to get the names first and reassign them. – SeGa Apr 25 '19 at 09:25
@Sotos Ah thanks, I always forget about `lengths`. Made an edit. – Maurits Evers Apr 25 '19 at 09:27
@SeGa I've added an alternative approach using `merge`. The only difference is that it returns a `data.frame` rather than a `matrix`. – Maurits Evers Apr 25 '19 at 09:29
Also nice approach, but yeah I would like to keep the matrix format. I've added it in my question. – SeGa Apr 25 '19 at 09:31
3

if you specify `by = 0` in `merge` (merge on row names), `lapply(inp, as.data.frame)` is enough. – Henrik Apr 25 '19 at 09:34
@Henrik Nice! I didn't know about the `by = 0` option. – Maurits Evers Apr 25 '19 at 09:46

zx8754 · Answer 3 · 2019-04-25T10:25:45.187

Here is using base:

do.call(cbind,
        lapply(inp, function(i){
          x <- data.frame(i, stringsAsFactors = FALSE)
          as.matrix( x[ seq(max(lengths(inp))), , drop = FALSE ] ) 
          #if we matrices have more than 1 column use:
          #as.matrix( x[ seq(max(sapply(inp, nrow))), , drop = FALSE ] )
        }
        ))


#    D1  D2 
# 1  "1" "3"
# 2  "2" "4"
# NA NA  "5"

The idea is to make all matrices to have the same number of rows. When we subset dataframe by index, rows that do not exist will be returned as NA, then we convert back to matrix and cbind.

Combine matrices of different length and keep column names

3 Answers3

Update