8

Here I have a list with different length vectors. And I'd want to get a data.frame. I've seen lots of posts about it in SO (see ref), but none of them are as simple as I expected because this is really a common task in data preprocessing. Thank you.

Here simplest means as.data.frame(aa) if it works. So one function from the base package of R will be great. sapply(aa, "length<-", max(lengths(aa))) has four functions actually.

An example is shown below.

Input:

aa <- list(A=c(1, 3, 4), B=c(3,5,7,7,8))

Output:

A B
1 3
3 5
4 7
NA 7
NA 8

A and B are the colnames of the data.frame.

One answer is sapply(aa, '[', seq(max(sapply(aa, length)))), but it's also complex.

ref:

  1. How to convert a list consisting of vector of different lengths to a usable data frame in R?

  2. Combining (cbind) vectors of different length

Community
  • 1
  • 1
Zhilong Jia
  • 2,329
  • 1
  • 22
  • 34
  • 2
    You can make it compact with `data.frame(lapply(aa, "length<-", max(lengths(aa))))` It is also faster when compared to `sapply(aa, length)` – akrun Nov 09 '15 at 16:13
  • [tag:data-science]??? – David Arenburg Nov 09 '15 at 16:13
  • @akrun, it's a solution, but not as simple as possible in R. – Zhilong Jia Nov 09 '15 at 16:19
  • @David Arenburg, It's related with data science as data preprocess is always an important part for data science due to the unformatted data. – Zhilong Jia Nov 09 '15 at 16:19
  • 1
    You can use `library(stringi); stri_list2matrix(aa)` but the character elements needs to be converted to `numeric` though. I am not sure whether `simple` means `compact` code for you though. – akrun Nov 09 '15 at 16:21
  • @akrun, I think `stri_list2matrix` is a simple answer, though I think there should be a function in the base package in R. In my opinion, `simple` means easy to use and to be understood. – Zhilong Jia Nov 09 '15 at 16:32
  • Well, you can create a function with these tools so that it becomes `simple` for you. – akrun Nov 09 '15 at 16:34
  • @akrun, `sapply(aa, "length<-", max(lengths(aa)))` works as well. Here it seems `length<-` means `length(x) <- max(lengths(aa))`? – Zhilong Jia Nov 09 '15 at 17:54
  • Yes, and it is very fast based on some benchmarks done earlier. – akrun Nov 10 '15 at 04:25
  • @ZhilongJia, I found the comment of @fdetsch [here](https://stackoverflow.com/questions/3699405/how-to-cbind-or-rbind-different-lengths-vectors-without-repeating-the-elements-o#comment71565926_36692363) interesting. Maybe something like `do.call(qpcR:::cbind.na, aa)` could be interesting, but is not fully base R though. – Valentin_Ștefan Jan 26 '18 at 22:04

3 Answers3

16

We can use

data.frame(lapply(aa, "length<-", max(lengths(aa))))

Or using tidyverse

library(dplyr)
library(tibble)
library(tidyr)
enframe(aa) %>%
    unnest(value)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 3
    We don't know what the OP regards as "simple", but `setDT` in place of `data.frame` saves some characters and operations. – Frank Nov 10 '15 at 04:34
  • @Frank I agree. It seems to me that the OP wants `base R` options. – akrun Nov 10 '15 at 04:35
2

Using tidyverse packages. Place the list in a nested data frame. Extract the name for each vector in the list. Unnest the data frame. Give a row index i for each element in each vector, spread the data in wide format

    aa <- list(A = c(1, 3, 4), B = c(3, 5, 7, 7, 8))
    library(tidyverse)
    data_frame(data = aa) %>% 
        group_by(name = names(data)) %>% 
        unnest() %>%
        mutate(i = row_number()) %>% 
        spread(name, data)
    # A tibble: 5 x 3
          i     A     B
    * <int> <dbl> <dbl>
    1     1     1     3
    2     2     3     5
    3     3     4     7
    4     4    NA     7
    5     5    NA     8
Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110
1

Make this function:

listToDF <- function(aa){
  sapply(aa, "length<-", max(lengths(aa)))
 }

Then use it, simply:

listToDF(aa)
MarkeD
  • 2,500
  • 2
  • 21
  • 35