47

I'm trying to convert a list of vectors (a multidimensional array essentially) into a data frame, but every time I try I'm getting unexpected results.

My aim is to instantiate a blank list, populate it in a for loop with vectors containing information about that iteration of the loop, then convert it into a data frame after it's finished.

> vectorList <- list()
> for(i in  1:5){
+     vectorList[[i]] <- c("number" = i, "square root" = sqrt(i))
+ }
> vectorList

Outputs:

> [[1]]
>      number square root 
>           1           1 
> 
> [[2]]
>      number square root 
>    2.000000    1.414214 
> 
> [[3]]
>      number square root 
>    3.000000    1.732051 
> 
> [[4]]
>      number square root 
>           4           2 
> 
> [[5]]
>      number square root 
>    5.000000    2.236068

Now I want this to become a data frame with 5 observations of 2 variables, but trying to create a data frame from 'vectorList'

numbers <- data.frame(vectorList)

results in 2 observations of 5 variables.

Weirdly it won't even be coerced with reshape2 (which I know would be an awful work around, but I tried).

Anyone got any insight?

h3rm4n
  • 4,126
  • 15
  • 21
Nick
  • 799
  • 1
  • 7
  • 18
  • Just a general note about your approach: you should not grow lists like this inside a for loop, if you can avoid it. When you add something to the end of a list, R has to copy the whole list. This is fine for small cases, but if your list is big (and it's getting bigger and bigger, in your case) this can be quite inefficient. – Taylor H Apr 27 '17 at 15:55
  • 2
    For your data construction, you could have used `lapply` like this: `vectorList <- lapply(1:5, function(x) c(x, sqrt(x)))`. – lmo Jul 06 '17 at 16:55

4 Answers4

63

You can use:

as.data.frame(do.call(rbind, vectorList))

Or:

library(data.table)
rbindlist(lapply(vectorList, as.data.frame.list))

Or:

library(dplyr)
bind_rows(lapply(vectorList, as.data.frame.list))
h3rm4n
  • 4,126
  • 15
  • 21
  • 1
    The first one returns a warning: `Warning message: In (function (..., deparse.level = 1) : number of columns of result is not a multiple of vector length (arg 3)` The second one and the third return the error: `Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0` – PM0087 Apr 22 '20 at 15:07
  • @PM0087 It works perfectly fine for me. Did you use the data as in the question? – h3rm4n Jul 27 '21 at 06:09
17

The fastest and most efficient way that I know is using the data.table::transpose function (if the length of your vector is low-dimensional):

as.data.frame(data.table::transpose(vectorList), col.names = names(vectorList[[1]]))

However, you will need to set the column names manually as data.table::transpose removes them. There is also a purrr::transpose function that does not remove the column names but it seems to be slower. Below a small benchmark including the suggestions of the other users:

vectorList = lapply(1:1000, function(i) (c("number" = i, "square root" = sqrt(i))))
bench = microbenchmark::microbenchmark(
  dplyr = dplyr::bind_rows(lapply(vectorList, as.data.frame.list)),
  rbindlist = data.table::rbindlist(lapply(vectorList, as.data.frame.list)),
  Reduce = Reduce(rbind, vectorList),
  transpose_datatable = as.data.frame(data.table::transpose(vectorList), col.names = names(vectorList[[1]])),
  transpose_purrr = data.table::as.data.table(purrr::transpose(vectorList)),
  do.call = as.data.frame(do.call(rbind, vectorList)),
  times = 10)
bench
# Unit: microseconds
#                 expr        min         lq        mean      median         uq        max neval cld
#                dplyr 286963.036 292850.136 320345.1137 310159.7380 341654.619 385399.851    10   b
#            rbindlist 285830.750 289935.336 306120.7257 309581.1895 318131.031 324217.413    10   b
#               Reduce   8573.474   9073.649  12114.5559   9632.1120  11153.511  33446.353    10  a 
#  transpose_datatable    372.572    424.165    500.8845    479.4990    532.076    701.822    10  a 
#      transpose_purrr    539.953    590.365    672.9531    671.1025    718.757    911.343    10  a 
#              do.call    452.915    537.591    562.9144    570.0825    592.334    641.958    10  a 

# now use bigger list and disregard the slowest
vectorList = lapply(1:100000, function(i) (c("number" = i, "square root" = sqrt(i))))
bench.big = microbenchmark::microbenchmark(
  transpose_datatable = as.data.frame(data.table::transpose(vectorList), col.names = names(vectorList[[1]])),
  transpose_purrr = data.table::as.data.table(purrr::transpose(vectorList)),
  do.call = as.data.frame(do.call(rbind, vectorList)),
  times = 10)
bench.big
# Unit: milliseconds
#                 expr       min        lq       mean     median         uq       max neval cld
#  transpose_datatable  3.470901   4.59531   4.551515   4.708932   4.873755   4.91235    10 a  
#      transpose_purrr 61.007574  62.06936  68.634732  65.949067  67.477948  97.39748    10  b 
#              do.call 97.680252 102.04674 115.669540 104.983596 138.193644 151.30886    10   c
Giuseppe
  • 786
  • 1
  • 7
  • 18
11

Also Reduce:

Reduce(rbind, vectorList)

    # number square root
# init      1    1.000000
          # 2    1.414214
          # 3    1.732051
          # 4    2.000000
          # 5    2.236068
989
  • 12,579
  • 5
  • 31
  • 53
  • 2
    Note that `Reduce(rbind, vectorList)` returns a matrix, so you'd want to wrap it in `data.frame` to return a data.frame object. – lmo Jul 06 '17 at 16:59
6

An alternative solution using purrr:

purrr::map_dfr( vectorList, as.list )
# # A tibble: 5 x 2
#   number `square root`
#    <dbl>         <dbl>
# 1      1          1   
# 2      2          1.41
# 3      3          1.73
# 4      4          2   
# 5      5          2.24

The code effectively converts each vector to a list and concatenates the results row-wise into a common data frame.

Artem Sokolov
  • 13,196
  • 4
  • 43
  • 74
  • the great thing about the tidyverse methods (both `dplyr::bind_rows()` and `purrr::map_dfr()`) is that they can deal with list elements that have different lengths, and named vectors that vary in their order from element to element. Very useful for example when converting the output of `xml2::xml_attrs()` into rectangular data. – stragu Jul 16 '19 at 13:53