Unnesting a data frame containing lists

Question

I have a data frame that contains lists, like below:

# Load packages
library(dplyr)

# Create data frame
df <- structure(list(ID = 1:3, 
                     A = structure(list(c(9, 8), c(7,6), c(6, 9)), ptype = numeric(0), class = c("vctrs_list_of", "vctrs_vctr")), 
                     B = structure(list(c(3, 5), c(2, 6), c(1, 5)), ptype = numeric(0), class = c("vctrs_list_of", "vctrs_vctr")), 
                     C = structure(list(c(6, 5), c(7, 6), c(8, 7)), ptype = numeric(0), class = c("vctrs_list_of", "vctrs_vctr")), 
                     D = structure(list(c(5, 3), c(4, 1), c(6,  5)), ptype = numeric(0), class = c("vctrs_list_of", "vctrs_vctr"))), 
                row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))

# Peek at data 
df
#> # A tibble: 3 x 5
#>      ID A         B         C         D        
#>   <int> <list>    <list>    <list>    <list>   
#> 1     1 <dbl [2]> <dbl [2]> <dbl [2]> <dbl [2]>
#> 2     2 <dbl [2]> <dbl [2]> <dbl [2]> <dbl [2]>
#> 3     3 <dbl [2]> <dbl [2]> <dbl [2]> <dbl [2]>

I'd like to unnest the lists and can do so using pmap_dfr.

# Expand rows
df %>% purrr::pmap_dfr(function(...)data.frame(...))
#>   ID A B C D
#> 1  1 9 3 6 5
#> 2  1 8 5 5 3
#> 3  2 7 2 7 4
#> 4  2 6 6 6 1
#> 5  3 6 1 8 6
#> 6  3 9 5 7 5

^{Created on 2019-06-28 by the reprex package (v0.3.0)}

This is the desired result, but seems to be reinventing the wheel because tidyr::unnest is designed to flatten list columns back to regular columns. Using tidyr::unnest produces the following error, however:

df %>% unnest(cols = c(A, B, C, D))
#Error: No common type for `x` <tbl_df<A:double>> and `y` <double>.
#Call `rlang::last_error()` to see a backtrace

How would I apply unnest in this case for flattening my data frame with list columns?

Version information

> packageVersion("tidyr")
[1] ‘0.8.3.9000’

I see no difference between `unnest(df)` and `pmap_dfr(df, data.frame)`, both are your intended output (though the latter is not a `tbl_df`). — r2evans, Jun 28 '19 at 17:49
@r2evans Weird. `unnest` throws an error for me. Then again, I'm using the development version of `tidyr`... — Dan, Jun 28 '19 at 17:51
Mine is `tidyr-0.8.2`, perhaps you have a new [issue](https://github.com/tidyverse/tidyr/issues)? — r2evans, Jun 28 '19 at 17:52
@r2evans Thanks, I'll create an issue. Is SO protocol to delete a question due to a bug in a dev version or leave it? — Dan, Jun 28 '19 at 17:54
I experience the same issue with tidyr development version. Turning your columns into ordinary lists solves the issue for me: `df %>% mutate_at(.vars = vars(A:D), as.list) %>% unnest(cols = A:D)` — Joris C., Jun 28 '19 at 17:55
I wasn't aware of the syntax `unnest(df, cols = c(A, B, C, D))`, and it doesn't work for my version of tidyr either. Is that maybe the issue? `unnest(df)` and `unnest(df, A, B, C, D)` are equivalent and work fine. — , Jun 28 '19 at 17:56
I think keeping it here for the time being is actually relevant, as others may have similar questions, and seeing these comments will be incredibly useful. Depending on the response-time on github (not sure when the devs' attention will roll back around to `tidyr`), you might post an answer here stating (1) it's a current bug, and (2) here are one or two workarounds for the time being. Then later you can (3) update when the github/CRAN versions have been fixed. Thanks, nice find! — r2evans, Jun 28 '19 at 17:56
I also think cross-linking both the issue and the SO question would be a good thing. — r2evans, Jun 28 '19 at 17:58
@gersht Perhaps this is new, but omitting `cols` in the dev version produces `Warning message: 'cols' is now required. Please use 'cols = c(A, B, C, D)' ` — Dan, Jun 28 '19 at 18:01
An issue is now filed on github [here](https://github.com/tidyverse/tidyr/issues/658#issue-462153556). — Dan, Jun 28 '19 at 18:18
After playing around a bit I've come to the conclusion that there probably isn't any issue, I think `nest` has just become more focused and that `chop` has taken over some of its duties. See my answer below and let me know what you think. — , Jun 28 '19 at 18:55

score 1 · Answer 1 · 2019-07-23T05:33:43.747

Note: Hadley Wickham has flagged this issue on github as a bug in tidyr version 0.8.3.9000 (see here). I'll leave the below answer as a potential workaround until the issue is fixed.

It looks like nest is more specifically used to create list-columns of dataframes in 0.8.3.9000. From the docs: Nesting creates a list-column of data frames; unnesting flattens it back out into regular columns.. For example, try:

df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1) %>% 
    nest(data = c(y, z))

Which returns:

# A tibble: 3 x 2
      x           data
  <dbl> <list<df[,2]>>
1     1            [2]
2     2            [2]
3     3            [2]

Then look at df$data:

<list_of<
  tbl_df<
    y: integer
    z: integer
  >
>[3]>
[[1]]
# A tibble: 3 x 2
      y     z
  <int> <int>
1     1     6
2     2     5
3     3     4

[[2]]
# A tibble: 2 x 2
      y     z
  <int> <int>
1     4     3
2     5     2

[[3]]
# A tibble: 1 x 2
      y     z
  <int> <int>
1     6     1

Your dataframe's columns are list-columns of vectors, which seem to fall under purview of chop, which shortens a dataframes while preserving their width. For example, try:

df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1) %>% 
    chop(c(y, z))

Which returns:

# A tibble: 3 x 3
      x y         z        
  <dbl> <list>    <list>   
1     1 <int [3]> <int [3]>
2     2 <int [2]> <int [2]>
3     3 <int [1]> <int [1]>

And take a look at df$y:

[[1]]
[1] 1 2 3

[[2]]
[1] 4 5

[[3]]
[1] 6

Knowing this, the appropriate method for your data would be chop's counterpart unchop, so given your dataframe:

# A tibble: 3 x 5
     ID           A           B           C           D
  <int> <list<dbl>> <list<dbl>> <list<dbl>> <list<dbl>>
1     1         [2]         [2]         [2]         [2]
2     2         [2]         [2]         [2]         [2]
3     3         [2]         [2]         [2]         [2]

Try unchop(df, c(A, B, C, D)) or unchop(df, A:D), which should return:

# A tibble: 6 x 5
     ID     A     B     C     D
  <int> <dbl> <dbl> <dbl> <dbl>
1     1     9     3     6     5
2     1     8     5     5     3
3     2     7     2     7     4
4     2     6     6     6     1
5     3     6     1     8     6
6     3     9     5     7     5

+1 for the thoughtful solution, but Hadley has now flagged it as a bug. Thanks for taking the time to think about this. — Dan, Jul 22 '19 at 22:30
Thanks for getting back to me @Lyngbakr. I've added a note pointing out that it is likely a bug, but I'll leave the answer just in case it helps someone. — , Jul 23 '19 at 05:38

Unnesting a data frame containing lists

Version information

1 Answers1

Linked