R: How to extract a list from a dataframe?

Question

Consider this simple example

> weird_df <- data_frame(col1 =c('hello', 'world', 'again'),
+                       col_weird = list(list(12,23), list(23,24), NA))
> 
> weird_df
# A tibble: 3 x 2
   col1  col_weird
  <chr>     <list>
1 hello <list [2]>
2 world <list [2]>
3 again  <lgl [1]>

I need to extract the values in the col_weird. How can I do that? I see how to do that in Python but not in R. Expected output is:

> good_df
# A tibble: 3 x 3
   col1   tic   toc
  <chr> <dbl> <dbl>
1 hello    12    23
2 world    23    24
3 again    NA    NA

`unlist(weird_df$col_weird[[1]])` `unlist(weird_df$col_weird[[2]])` — Enrique Pérez Herrero, Jun 29 '17 at 18:03
thanks @EnriquePérezHerrero but can you do that in a clean, dplyr way? I want to be sure the data is correctlyy processed — ℕʘʘḆḽḘ, Jun 29 '17 at 18:04
You can try `weird_df %>% rowwise() %>% mutate(col_weird = list(unlist(col_weird))) %>% ungroup %>% unnest %>% group_by(col1) %>% mutate(rn = c('tic', 'toc')[row_number()]) %>% spread(rn, col_weird)` — akrun, Jun 29 '17 at 18:18
yes, then don't use it. I didn't know that there were already some solutions posted — akrun, Jun 29 '17 at 18:45
@akrun the actual solutions do not work with the current example... — ℕʘʘḆḽḘ, Jun 29 '17 at 18:47
My solution was based on the example you posted. So, if the structure is different perhaps you may need to update the example — akrun, Jun 29 '17 at 18:48
@akrun the example is correct. I am saying the solutions below are not correct — ℕʘʘḆḽḘ, Jun 29 '17 at 18:55
Based on the updated question, my solution still works on that. Other option is `library(purrr);library(reshape2);weird_df$col_weird %>% map(unlist) %>% setNames(., weird_df$col1) %>% melt %>% group_by(L1) %>% mutate(rn = c('tic', 'toc')[row_number()]) %>% spread(rn, value)` — akrun, Jun 29 '17 at 18:56
If you want a compact option, then use `data.table` `dcast(setDT(weird_df)[,.(col1 = rep(col1, lengths(col_weird)), col_weird = unlist(col_weird)) ], col1 ~ c("tic", "toc")[rowid(col1)])` — akrun, Jun 29 '17 at 19:02
@akrun I would love to accept the `purr` solution, but can you please post that as an asnwer and explain what you are doing? also, can you write the code so that to obtain the final dataframe as shown (that is, keeping the other variables and creating the new ones). Thanks!! — ℕʘʘḆḽḘ, Jun 29 '17 at 19:06

aosmith · Accepted Answer · 2017-06-29T20:42:47.227

If you collapse the list column into a string you can use separate from tidyr. I used map from purrr to loop through the list column and create a string with toString.

library(tidyr)
library(purrr)

weird_df %>%
     mutate(col_weird = map(col_weird, toString ) ) %>%
     separate(col_weird, into = c("tic", "toc"), convert = TRUE)

# A tibble: 3 x 3
   col1   tic   toc
* <chr> <int> <int>
1 hello    12    23
2 world    23    24
3 again    NA    NA

You can actually use separate directly without the toString part but you end up with "list" as one of the values.

weird_df %>%
     separate(col_weird, into = c("list", "tic", "toc"), convert = TRUE) %>%
     select(-list)

This led me to tidyr::extract, which works fine with the right regular expression. If your list column was more complicated, though, writing out the regular expression might be a pain.

weird_df %>%
     extract(col_weird, into = c("tic", "toc"), regex = "([[:digit:]]+), ([[:digit:]]+)", convert = TRUE)

hummm very, veeeeery clean. im jelly – ℕʘʘḆḽḘ Jun 29 '17 at 20:25 — ℕʘʘḆḽḘ, Jun 29 '17 at 20:25

Enrique Pérez Herrero · Answer 2 · 2017-06-29T18:59:52.907

2

weird_df <- data_frame(col1 = c('hello', 'world'),
                   col_weird = list(list(12,23), list(23,24)))

library(dplyr)
weird_df %>%
  dplyr::mutate(tic = unlist(magrittr::extract2(col_weird, 1)),
                toc = unlist(magrittr::extract2(col_weird, 2)),
                col_weird = NULL)

With the last changes: Note that now col_weird contains list(NA, NA)

weird_df <- data_frame(col1 = c('hello', 'world', 'again'),
                  col_weird = list(list(12,23), list(23,24), list(NA, NA)))

library(dplyr)
weird_df %>%
 dplyr::mutate(col_weird = matrix(col_weird),
 tic = sapply(col_weird, function(x) magrittr::extract2(x, 1)),
 toc = sapply(col_weird, function(x) magrittr::extract2(x, 2)),
 col_weird = NULL)

edited Jun 29 '17 at 18:59

answered Jun 29 '17 at 18:10

Enrique Pérez Herrero

3,699
2
32
33

great! why do we have to use `magrittr` here? – ℕʘʘḆḽḘ Jun 29 '17 at 18:11
1

`magrittr::extract2` is an alternate to `[[`, it is used to make R pipes more readable. – Enrique Pérez Herrero Jun 29 '17 at 18:14
cool, never understood what is the `[[` for. extract elements from a list? – ℕʘʘḆḽḘ Jun 29 '17 at 18:15
1

When you code an R package you need to use NSE functions like `mutate_` and to @importFrom in roxygen2. Then it is useful to add @importFrom magrittr extract2. – Enrique Pérez Herrero Jun 29 '17 at 18:16
hold on one sec – ℕʘʘḆḽḘ Jun 29 '17 at 18:19
It seems this solution would fail with missing values. Do you have an idea? question updated.. sorry and thanks again!! – ℕʘʘḆḽḘ Jun 29 '17 at 18:24
1

Enrico are you still there? :D – ℕʘʘḆḽḘ Jun 29 '17 at 18:43

score 2 · Answer 3 · answered Jun 29 '17 at 18:13

2

You can do this with basic R, thanks to I():

weird_df <- data.frame(col1 =c('hello', 'world'), 
   col_weird = I(list(list(12,23),list(23,24))))

weird_df
>    col1 col_weird
  1 hello    12, 23
  2 world    23, 24

answered Jun 29 '17 at 18:13

psychOle

1,054
9
19

score 2 · Answer 4 · answered Jun 29 '17 at 19:13

Here is one option to do with purrr/tidyverse/reshape2. We unlist the 'col_weird' within map to get the output as list, set the names of the list with 'col1', melt to 'long' format, grouped by 'L1', create a 'rn' column and spread it back to 'wide'

library(tidyverse)
library(reshape2)
weird_df$col_weird %>%
     map(unlist) %>% 
     setNames(., weird_df$col1) %>%
     melt %>% 
     group_by(L1) %>%
     mutate(rn = c('tic', 'toc')[row_number()]) %>%
     spread(rn, value) %>%
     left_join(weird_df[-2], ., by = c(col1 = "L1"))

that's really nice man, but what do you think of my solution? It appear to be even simpler — ℕʘʘḆḽḘ, Jun 29 '17 at 19:18

ℕʘʘḆḽḘ · Answer 5 · 2017-06-29T19:29:34.193

2

well, I came up with a simple one

> weird_df %>% 
+   rowwise() %>%
+   mutate(tic = col_weird[[1]],
+          tac = ifelse(length(col_weird) == 2, col_weird[[2]], NA)) %>% 
+   select(-col_weird) %>% ungroup()
# A tibble: 3 x 3
   col1   tic   tac
  <chr> <dbl> <dbl>
1 hello    12    23
2 world    23    24
3 again    NA    NA

edited Jun 29 '17 at 19:29

answered Jun 29 '17 at 19:20

ℕʘʘḆḽḘ

18,566
34
128
235

R: How to extract a list from a dataframe?

5 Answers5

Linked