6

I have a nested lists like this :

x <- list(x = list(a = 1, 
                   b = 2), 
          y = list(a = 3, 
                   b = 4))

And I would like to convert the nested list into data.frames and then bind all data frames into one.

For this level of nesting I can do it with this line :

do.call(rbind.data.frame, lapply(x, as.data.frame, stringsAsFactors = FALSE))

So the result is :

  a b
x 1 2
y 3 4

My problem is that I would like to achieve that regardless of the level of nesting. Another example with this list :

x <- list(X = list(x = list(a = 1, 
                       b = 2), 
              y = list(a = 3, 
                       b = 4)),
     Y = list(x = list(a = 1, 
                       b = 2), 
              y = list(a = 3, 
                       b = 4)))

do.call(rbind.data.frame, lapply(x, function(x) do.call(rbind.data.frame, lapply(x, as.data.frame, stringsAsFactors = FALSE))))

    a b
X.x 1 2
X.y 3 4
Y.x 1 2
Y.y 3 4

Does anyone has an idea to generelized this to any level of nesting ? Thanks for any help

Julien Navarre
  • 7,653
  • 3
  • 42
  • 69
  • 2
    Is there any guarantee about the structure of the input list? Will it always have the leafs containing the same number of elements? Can we count on it always being 2 columns in the output? Or will that possibly be different? – Dason Apr 24 '17 at 15:33

5 Answers5

10

Borrowing from Spacedman and flodel here, we can define the following pair of recursive functions:

library(tidyverse)  # I use dplyr and purrr here, plus tidyr further down below

depth <- function(this) ifelse(is.list(this), 1L + max(sapply(this, depth)), 0L)

bind_at_any_depth <- function(l) {
  if (depth(l) == 2) {
    return(bind_rows(l))
  } else {
    l <- at_depth(l, depth(l) - 2, bind_rows)
    bind_at_any_depth(l)
  }
}

We can now bind any arbitrary depth list into a single data.frame:

bind_at_any_depth(x)
# A tibble: 2 × 2
      a     b
  <dbl> <dbl>
1     1     2
2     3     4
bind_at_any_depth(x_ext) # From P Lapointe
# A tibble: 5 × 2
      a     b
  <dbl> <dbl>
1     1     2
2     5     6
3     7     8
4     1     2
5     3     4

If you want to keep track of the origin of each row, you can use this version:

bind_at_any_depth2 <- function(l) {
  if (depth(l) == 2) {
    l <- bind_rows(l, .id = 'source')
    l <- unite(l, 'source', contains('source'))
    return(l)
  } else {
    l <- at_depth(l, depth(l) - 2, bind_rows, .id = paste0('source', depth(l)))
    bind_at_any_depth(l)
  }
}

This will add a source column:

bind_at_any_depth2(x_ext)
# A tibble: 5 × 3
  source     a     b
*  <chr> <dbl> <dbl>
1  X_x_1     1     2
2  X_y_z     5     6
3 X_y_zz     7     8
4  Y_x_1     1     2
5  Y_y_1     3     4

Note: At some point you can use purrr::depth, and will need to change at_depth to modify_depth when their new version rolls out to CRAN (thanks @ManuelS).

Community
  • 1
  • 1
Axeman
  • 32,068
  • 8
  • 81
  • 94
  • 1
    You should mention that `purrr::depth()` is part of the development version of `purrr` and that `modify_depth()` will eventually replace `at_depth()`. otherwise: great answer – Manuel R Apr 24 '17 at 15:57
  • @ManuelS, Actually, I was unaware of that. Thanks! I was using the `depth` function as defined at the start of my code (borrowed from the QA linked). – Axeman Apr 24 '17 at 16:00
  • oh i see. I overlooked that part and assumed you are using the `purrr::depth()` function. Probably makes sense to use that version although your function probably does the same – Manuel R Apr 24 '17 at 16:02
3

UPDATE

Here's a way to flatten more deeply nested lists simply with unlist. Since the structure is now uneven, the result will not be a data.frame.

x_ext <- list(X = list(x = list(a = 1,
                       b = 2),
              y = list(z=list(a = 5,
                       b = 6),
                       zz=list(a = 7,
                       b = 8))),
     Y = list(x = list(a = 1,
                       b = 2),
              y = list(a = 3,
                       b = 4)))

unlist(x_ext)

   X.x.a    X.x.b  X.y.z.a  X.y.z.b X.y.zz.a X.y.zz.b    Y.x.a    Y.x.b    Y.y.a    Y.y.b 
       1        2        5        6        7        8        1        2        3        4 

My initial answer was unlist first and rbind aftrerwards. However, it works only with the example in the question.

x_unlist <- unlist(x, recursive = FALSE)
do.call("rbind", x_unlist)
    a b
X.x 1 2
X.y 3 4
Y.x 1 2
Y.y 3 4
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56
2

You can flatten and coerce to a data.frame while collecting names with purrr::flatten_df from the development version:

library(purrr)    # or library(tidyverse)

x <- list(X = list(x = list(a = 1, 
                       b = 2), 
              y = list(a = 3, 
                       b = 4)),
     Y = list(x = list(a = 1, 
                       b = 2), 
              y = list(a = 3, 
                       b = 4)))

x %>% flatten_df(.id = 'var')
#> # A tibble: 4 × 3
#>     var     a     b
#>   <chr> <dbl> <dbl>
#> 1     x     1     2
#> 2     y     3     4
#> 3     x     1     2
#> 4     y     3     4

or if you want to save both sets of names, map_df:

library(tidyverse)

x %>% map_df(~bind_rows(.x, .id = 'var2'), .id = 'var1')
#> # A tibble: 4 × 4
#>    var1  var2     a     b
#>   <chr> <chr> <dbl> <dbl>
#> 1     X     x     1     2
#> 2     X     y     3     4
#> 3     Y     x     1     2
#> 4     Y     y     3     4
alistaire
  • 42,459
  • 4
  • 77
  • 117
0

This builds on P.Lapointe's answer and uses idea from here and here to extract the final names in the list.

 bind <- function(x) {
     s = stack(unlist(x))
     s$major = tools::file_path_sans_ext(s$ind)
     s$minor = tools::file_ext(s$ind)
     as.data.frame.matrix(xtabs(data=s, values ~  major + minor))
 }

 bind(x)
    a b
X.x 1 2
X.y 3 4
Y.x 1 2
Y.y 3 4

 bind(x_ext)
       a b
X.x    1 2
X.y.z  5 6
X.y.zz 7 8
Y.x    1 2
Y.y    3 4
Community
  • 1
  • 1
user2957945
  • 2,353
  • 2
  • 21
  • 40
-1

We can do this with tidyverse

library(tidyverse)
x %>% 
   map(bind_rows) %>%
   bind_rows(.id = 'grp')
# A tibble: 4 × 3
#     grp     a     b    
#   <chr> <dbl> <dbl>
#1     X     1     2
#2     X     3     4
#3     Y     1     2
#4     Y     3     4

Or using base R

do.call(rbind, do.call(c, x))
#    a b
#X.x 1 2
#X.y 3 4
#Y.x 1 2
#Y.y 3 4
akrun
  • 874,273
  • 37
  • 540
  • 662