28

I have a data frame where the last column is a column of lists. Below is how it looks:

Col1 | Col2 | ListCol
--------------------------
 na  |  na  | [obj1, obj2]
 na  |  na  | [obj1, obj2]
 na  |  na  | [obj1, obj2]

What I want is

Col1 | Col2 | Col3  | Col4
--------------------------
 na  |  na  | obj1  | obj2
 na  |  na  | obj1  | obj2
 na  |  na  | obj1  | obj2

I know that all the lists have the same amount of elements.

Edit:

Every element in ListCol is a list with two elements.

Sam Firke
  • 21,571
  • 9
  • 87
  • 105
Santi
  • 381
  • 1
  • 3
  • 7
  • 1
    It depends a lot on how `ListCol` is structured. If it contains a data frame or named list for each row, just `tidyr::unnest` will work. If it's some other structure, you may need to rearrange first. To get a better answer, edit with the result of calling `dput` on your sample data so we can reproduce the exact structure. – alistaire Jun 15 '18 at 19:15
  • 2
    Hello. I've tried unnest but what it's been doing is getting the objects to separate, but in different rows rather than columns. Every row of ListCol is a list – Santi Jun 15 '18 at 19:23
  • The simplest way to get it to expand sideways instead of down is to make each list element a 1-row data frame, e.g. with `df$ListCol <- lapply(df$ListCol, function(x) as.data.frame(t(x)))` (with dplyr and purrr, if you prefer) and then calling `unnest`. – alistaire Jun 15 '18 at 21:43
  • here's alisatire's solution for a similar problem: https://stackoverflow.com/questions/49889246/how-to-unnest-column-list using invoke_map and tibble. and here are several other solutions: https://stackoverflow.com/questions/49689927/unnest-a-list-column-directly-into-several-columns – Arthur Yip Jul 07 '19 at 02:00

5 Answers5

23

Currently, the tidyverse answer would be:

library(dplyr)
library(tidyr)
data %>% unnest_wider(ListCol)
iago
  • 2,990
  • 4
  • 21
  • 27
  • 1
    You can also add the `names_sep` parameter if you need to keep the name of the nested column (e.g. `data %>% unnest_wider(ListCol, names_sep="_")` would lead to `ListCol_Col3`, which is handy when unnesting several columns at once). – Mario Reutter Jul 14 '22 at 11:45
11

Here is one approach, using unnest and tidyr::spread...

library(dplyr)
library(tidyr)

#example df
df <- tibble(a=c(1, 2, 3), b=list(c(2, 3), c(4, 5), c(6, 7)))

df %>% unnest(b) %>% 
       group_by(a) %>% 
       mutate(col=seq_along(a)) %>% #add a column indicator
       spread(key=col, value=b)

      a   `1`   `2`
  <dbl> <dbl> <dbl>
1    1.    2.    3.
2    2.    4.    5.
3    3.    6.    7.
Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32
  • 6
    in your example you are just doing `cbind(df[1],do.call(rbind,df$b))` or even `cbind(df[1],t(data.frame(df$b)))` – Onyambu Jun 15 '18 at 20:12
  • @Onyambu, don't you want to write a complete answer to this post? It helped me and could help others but I missed it the first time. – JMarcelino Jul 09 '20 at 09:19
  • Help doc for Spread says it's superseded by pivot_wider, which works nicely as well. – ARobertson Nov 27 '22 at 05:10
5

Comparison of two great answers

There are two great one liner suggestions in this thread:

(1) cbind(df[1], t(data.frame(df$b)))

This is from @Onyambu using base R. To get to this answer one needs to know that a dataframe is a list and needs a bit of creativity.

(2) df %>% unnest_wider(b)

This is from @iago using tidyverse. You need extra packages and to know all the nest verbs, but one can think that it is more readable.

Now let's compare performance

library(dplyr)
library(tidyr)
library(purrr)
library(microbenchmark)

N <- 100
df <- tibble(a = 1:N, b = map2(1:N, 1:N, c))

tidy_foo <- function() suppressMessages(df %>% unnest_wider(b))
base_foo <- function() cbind(df[1],t(data.frame(df$b))) %>% as_tibble # To be fair
  
microbenchmark(tidy_foo(), base_foo())

Unit: milliseconds
       expr      min        lq      mean    median       uq      max neval
 tidy_foo() 102.4388 108.27655 111.99571 109.39410 113.1377 194.2122   100
 base_foo()   4.5048   4.71365   5.41841   4.92275   5.2519  13.1042   100

Aouch!

base R solution is 20 times faster.

pietrodito
  • 1,783
  • 15
  • 24
  • 3
    Using my real-world data & problem, I see less of a difference. My data.frame has 100k rows, 65 cols, and I'm unnesting a single pair of variables. The tidyr solution takes 12.5 seconds, the base R solution takes 11 seconds, so the base R solution is 1.14x faster. Users might want to test on their own data. – Sam Firke Oct 22 '21 at 14:35
1

Here's an option with data.table and base::unlist.

library(data.table)

DT <- data.table(a = list(1, 2, 3),
                                 b = list(list(1, 2),
                                              list(2, 1),
                                              list(1, 1)))

for (i in 1:nrow(DT)) {
  set(
    DT,
    i = i,
    j = c('b1', 'b2'),
    value = unlist(DT[i][['b']], recursive = FALSE)
  )
}
DT

This requires a for loop on every row... Not ideal and very anti-data.table. I wonder if there's some way to avoid creating the list column in the first place...

Alec
  • 63
  • 1
  • 3
1

@Alec data.table offers tstrsplit function to split a column into multiple columns.

DT = data.table(x=c("A/B", "A", "B"), y=1:3)
DT[]

#     x y
#1: A/B 1
#2:   A 2
#3:   B 3
DT[, c("c1") := tstrsplit(x, "/", fixed=TRUE, keep=1L)][] # keep only first

#     x y c1
#1: A/B 1  A
#2:   A 2  A
#3:   B 3  B

DT[, c("c1", "c2") := tstrsplit(x, "/", fixed=TRUE)][]

#     x y c1   c2
#1: A/B 1  A    B
#2:   A 2  A <NA>
#3:   B 3  B <NA>
Matthew Son
  • 1,109
  • 8
  • 27