8

I have a dataframe with several columns containing list columns that I want to unnest (or unchop). BUT, they are different lengths, so the resulting error is Error: No common size for...

Here is a reprex to show what works and doesn't work.

library(tidyr)
library(vctrs)

# This works as expected
df_A <- tibble(
  ID = 1:3,
  A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9)))
)

unchop(df_A, cols = c(A))
# A tibble: 7 x 2
     ID     A
  <int> <dbl>
1     1     9
2     1     8
3     1     5
4     2     7
5     2     6
6     3     6
7     3     9

# This works as expected as the lists are the same lengths

df_AB_1 <- tibble(
  ID = 1:3,
  A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9))),
  B = as_list_of(list(c(1, 2, 3), c(4, 5), c(7, 8)))
)

unchop(df_AB_1, cols = c(A, B))

# A tibble: 7 x 3
     ID     A     B
  <int> <dbl> <dbl>
1     1     9     1
2     1     8     2
3     1     5     3
4     2     7     4
5     2     6     5
6     3     6     7
7     3     9     8

# This does NOT work as the lists are different lengths

df_AB_2 <- tibble(
  ID = 1:3,
  A = as_list_of(list(c(9, 8, 5), c(7,6), c(6, 9))),
  B = as_list_of(list(c(1, 2), c(4, 5, 6), c(7, 8, 9, 0)))
)

unchop(df_AB_2, cols = c(A, B))

# Error: No common size for `A`, size 3, and `B`, size 2.

The output that I would like to achieve for df_AB_2 above is as follows where each list is unchopped and missing values are filled with NA:

# A tibble: 10 x 3
      ID     A     B
   <dbl> <dbl> <dbl>
 1     1     9     1
 2     1     8     2
 3     1     5    NA
 4     2     7     4
 5     2     6     5
 6     2    NA     6
 7     3     6     7
 8     3     9     8
 9     3    NA     9
10     3    NA     0

I have referenced this issue on Github and StackOverflow here.

Any ideas how to achieve the result above?

Versions

> packageVersion("tidyr")
[1] ‘1.0.0’
> packageVersion("vctrs")
[1] ‘0.2.0.9001’
Megan Beckett
  • 337
  • 1
  • 6

3 Answers3

10

Here is an idea via dplyr that you can generalise to as many columns as you want,

library(tidyverse)

df_AB_2 %>% 
 pivot_longer(c(A, B)) %>% 
 mutate(value = lapply(value, `length<-`, max(lengths(value)))) %>% 
 pivot_wider(names_from = name, values_from = value) %>% 
 unnest() %>% 
 filter(rowSums(is.na(.[-1])) != 2)

which gives,

# A tibble: 10 x 3
      ID     A     B
   <int> <dbl> <dbl>
 1     1     9     1
 2     1     8     2
 3     1     5    NA
 4     2     7     4
 5     2     6     5
 6     2    NA     6
 7     3     6     7
 8     3     9     8
 9     3    NA     9
10     3    NA     0
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • 1
    Ah, super neat! Good idea to pivot back and forth and create all lists the same length. Thanks! – Megan Beckett Dec 05 '19 at 10:31
  • 1
    No problem. Glad to help. Just make sure that you adjust the final filter based on the number of columns you have to pivot_longer/unnest. If you have 3 columns with lists to unnest then is.na() != 3 etc... – Sotos Dec 05 '19 at 10:35
3

Defining a helper function to update the lengths of the element and proceeding with dplyr:

foo <- function(x, len_vec) {
  lapply(
    seq_len(length(x)), 
    function(i) {
      length(x[[i]]) <- len_vec[i]
      x[[i]]
    } 
  )
}

df_AB_2 %>% 
  mutate(maxl = pmax(lengths(A), lengths(B))) %>% 
  mutate(A = foo(A, maxl), B = foo(B, maxl)) %>% 
  unchop(cols = c(A, B)) %>% 
  select(-maxl)

# A tibble: 10 x 3
      ID     A     B
   <int> <dbl> <dbl>
 1     1     9     1
 2     1     8     2
 3     1     5    NA
 4     2     7     4
 5     2     6     5
 6     2    NA     6
 7     3     6     7
 8     3     9     8
 9     3    NA     9
10     3    NA     0

Using data.table:

library(data.table)
setDT(df_AB_2)
df_AB_2[, maxl := pmax(lengths(A), lengths(B))]
df_AB_2[, .(unlist(A)[seq_len(maxl)], unlist(B)[seq_len(maxl)]), by = ID]
s_baldur
  • 29,441
  • 4
  • 36
  • 69
0

I came across this while trying to unnest a dataframe where each column contains a dataframe with varying numer of rows and columns.

I modified the answer by @Sotos to accommodate for this case.

Maybe there is someone else out there, who applied lms over the columns of their dataframe and needs to unnest the result in this fashion.

I intended to post this as a comment on the original answer, but don't have enough reputation for that yet.

df %>%  pivot_longer(everything()) %>% 
  mutate(value = 
           lapply(value,
                  function(x) {
                    x[1 + (dim(x)[1]):max(
                      unlist(lapply(value, function(x) dim(x)[1]))
                      ),] <- NA
                    return(x)
                    }
                  )) %>% 
  pivot_wider(names_from = name, values_from = value) %>% 
  unnest() %>% 
  filter(if_any(everything(), ~!is.na(.)))