Unnest or unchop dataframe containing lists of different lengths

Question

I have a dataframe with several columns containing list columns that I want to unnest (or unchop). BUT, they are different lengths, so the resulting error is Error: No common size for...

Here is a reprex to show what works and doesn't work.

library(tidyr)
library(vctrs)

# This works as expected
df_A <- tibble(
  ID = 1:3,
  A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9)))
)

unchop(df_A, cols = c(A))
# A tibble: 7 x 2
     ID     A
  <int> <dbl>
1     1     9
2     1     8
3     1     5
4     2     7
5     2     6
6     3     6
7     3     9

# This works as expected as the lists are the same lengths

df_AB_1 <- tibble(
  ID = 1:3,
  A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9))),
  B = as_list_of(list(c(1, 2, 3), c(4, 5), c(7, 8)))
)

unchop(df_AB_1, cols = c(A, B))

# A tibble: 7 x 3
     ID     A     B
  <int> <dbl> <dbl>
1     1     9     1
2     1     8     2
3     1     5     3
4     2     7     4
5     2     6     5
6     3     6     7
7     3     9     8

# This does NOT work as the lists are different lengths

df_AB_2 <- tibble(
  ID = 1:3,
  A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9))),
  B = as_list_of(list(c(1, 2), c(4, 5, 6), c(7, 8, 9, 0)))
)

unchop(df_AB_2, cols = c(A, B))

# Error: No common size for `A`, size 3, and `B`, size 2.

The output that I would like to achieve for df_AB_2 above is as follows where each list is unchopped and missing values are filled with NA:

# A tibble: 10 x 3
      ID     A     B
   <dbl> <dbl> <dbl>
 1     1     9     1
 2     1     8     2
 3     1     5    NA
 4     2     7     4
 5     2     6     5
 6     2    NA     6
 7     3     6     7
 8     3     9     8
 9     3    NA     9
10     3    NA     0

I have referenced this issue on Github and StackOverflow here.

Any ideas how to achieve the result above?

Versions

> packageVersion("tidyr")
[1] ‘1.0.0’
> packageVersion("vctrs")
[1] ‘0.2.0.9001’

score 10 · Accepted Answer · answered Dec 05 '19 at 10:21

10

Here is an idea via dplyr that you can generalise to as many columns as you want,

library(tidyverse)

df_AB_2 %>% 
 pivot_longer(c(A, B)) %>% 
 mutate(value = lapply(value, `length<-`, max(lengths(value)))) %>% 
 pivot_wider(names_from = name, values_from = value) %>% 
 unnest() %>% 
 filter(rowSums(is.na(.[-1])) != 2)

which gives,

# A tibble: 10 x 3
      ID     A     B
   <int> <dbl> <dbl>
 1     1     9     1
 2     1     8     2
 3     1     5    NA
 4     2     7     4
 5     2     6     5
 6     2    NA     6
 7     3     6     7
 8     3     9     8
 9     3    NA     9
10     3    NA     0

answered Dec 05 '19 at 10:21

Sotos

51,121
6
32
66

1

Ah, super neat! Good idea to pivot back and forth and create all lists the same length. Thanks! – Megan Beckett Dec 05 '19 at 10:31
1

No problem. Glad to help. Just make sure that you adjust the final filter based on the number of columns you have to pivot_longer/unnest. If you have 3 columns with lists to unnest then is.na() != 3 etc... – Sotos Dec 05 '19 at 10:35

s_baldur · Answer 2 · 2019-12-05T10:38:56.770

Defining a helper function to update the lengths of the element and proceeding with dplyr:

foo <- function(x, len_vec) {
  lapply(
    seq_len(length(x)), 
    function(i) {
      length(x[[i]]) <- len_vec[i]
      x[[i]]
    } 
  )
}

df_AB_2 %>% 
  mutate(maxl = pmax(lengths(A), lengths(B))) %>% 
  mutate(A = foo(A, maxl), B = foo(B, maxl)) %>% 
  unchop(cols = c(A, B)) %>% 
  select(-maxl)

# A tibble: 10 x 3
      ID     A     B
   <int> <dbl> <dbl>
 1     1     9     1
 2     1     8     2
 3     1     5    NA
 4     2     7     4
 5     2     6     5
 6     2    NA     6
 7     3     6     7
 8     3     9     8
 9     3    NA     9
10     3    NA     0

Using data.table:

library(data.table)
setDT(df_AB_2)
df_AB_2[, maxl := pmax(lengths(A), lengths(B))]
df_AB_2[, .(unlist(A)[seq_len(maxl)], unlist(B)[seq_len(maxl)]), by = ID]

Great! Thanks, really like the helper function. – Megan Beckett Dec 05 '19 at 10:31 — Megan Beckett, Dec 05 '19 at 10:31

score 0 · Answer 3 · answered Aug 14 '23 at 13:36

I came across this while trying to unnest a dataframe where each column contains a dataframe with varying numer of rows and columns.

I modified the answer by @Sotos to accommodate for this case.

Maybe there is someone else out there, who applied lms over the columns of their dataframe and needs to unnest the result in this fashion.

I intended to post this as a comment on the original answer, but don't have enough reputation for that yet.

df %>%  pivot_longer(everything()) %>% 
  mutate(value = 
           lapply(value,
                  function(x) {
                    x[1 + (dim(x)[1]):max(
                      unlist(lapply(value, function(x) dim(x)[1]))
                      ),] <- NA
                    return(x)
                    }
                  )) %>% 
  pivot_wider(names_from = name, values_from = value) %>% 
  unnest() %>% 
  filter(if_any(everything(), ~!is.na(.)))

Unnest or unchop dataframe containing lists of different lengths

Versions

3 Answers3

Linked