5

Consider this simple example

> weird_df <- data_frame(col1 =c('hello', 'world', 'again'),
+                       col_weird = list(list(12,23), list(23,24), NA))
> 
> weird_df
# A tibble: 3 x 2
   col1  col_weird
  <chr>     <list>
1 hello <list [2]>
2 world <list [2]>
3 again  <lgl [1]>

I need to extract the values in the col_weird. How can I do that? I see how to do that in Python but not in R. Expected output is:

> good_df
# A tibble: 3 x 3
   col1   tic   toc
  <chr> <dbl> <dbl>
1 hello    12    23
2 world    23    24
3 again    NA    NA
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
  • 2
    `unlist(weird_df$col_weird[[1]])` `unlist(weird_df$col_weird[[2]])` – Enrique Pérez Herrero Jun 29 '17 at 18:03
  • thanks @EnriquePérezHerrero but can you do that in a clean, dplyr way? I want to be sure the data is correctlyy processed – ℕʘʘḆḽḘ Jun 29 '17 at 18:04
  • 1
    You can try `weird_df %>% rowwise() %>% mutate(col_weird = list(unlist(col_weird))) %>% ungroup %>% unnest %>% group_by(col1) %>% mutate(rn = c('tic', 'toc')[row_number()]) %>% spread(rn, col_weird)` – akrun Jun 29 '17 at 18:18
  • @akrun thanks but that's a pretty convoluted solution! – ℕʘʘḆḽḘ Jun 29 '17 at 18:45
  • 1
    yes, then don't use it. I didn't know that there were already some solutions posted – akrun Jun 29 '17 at 18:45
  • @akrun the actual solutions do not work with the current example... – ℕʘʘḆḽḘ Jun 29 '17 at 18:47
  • 1
    My solution was based on the example you posted. So, if the structure is different perhaps you may need to update the example – akrun Jun 29 '17 at 18:48
  • @akrun the example is correct. I am saying the solutions below are not correct – ℕʘʘḆḽḘ Jun 29 '17 at 18:55
  • 1
    Based on the updated question, my solution still works on that. Other option is `library(purrr);library(reshape2);weird_df$col_weird %>% map(unlist) %>% setNames(., weird_df$col1) %>% melt %>% group_by(L1) %>% mutate(rn = c('tic', 'toc')[row_number()]) %>% spread(rn, value)` – akrun Jun 29 '17 at 18:56
  • 1
    If you want a compact option, then use `data.table` `dcast(setDT(weird_df)[,.(col1 = rep(col1, lengths(col_weird)), col_weird = unlist(col_weird)) ], col1 ~ c("tic", "toc")[rowid(col1)])` – akrun Jun 29 '17 at 19:02
  • @akrun I would love to accept the `purr` solution, but can you please post that as an asnwer and explain what you are doing? also, can you write the code so that to obtain the final dataframe as shown (that is, keeping the other variables and creating the new ones). Thanks!! – ℕʘʘḆḽḘ Jun 29 '17 at 19:06

5 Answers5

4

If you collapse the list column into a string you can use separate from tidyr. I used map from purrr to loop through the list column and create a string with toString.

library(tidyr)
library(purrr)

weird_df %>%
     mutate(col_weird = map(col_weird, toString ) ) %>%
     separate(col_weird, into = c("tic", "toc"), convert = TRUE)

# A tibble: 3 x 3
   col1   tic   toc
* <chr> <int> <int>
1 hello    12    23
2 world    23    24
3 again    NA    NA

You can actually use separate directly without the toString part but you end up with "list" as one of the values.

weird_df %>%
     separate(col_weird, into = c("list", "tic", "toc"), convert = TRUE) %>%
     select(-list)

This led me to tidyr::extract, which works fine with the right regular expression. If your list column was more complicated, though, writing out the regular expression might be a pain.

weird_df %>%
     extract(col_weird, into = c("tic", "toc"), regex = "([[:digit:]]+), ([[:digit:]]+)", convert = TRUE)
aosmith
  • 34,856
  • 9
  • 84
  • 118
2
weird_df <- data_frame(col1 = c('hello', 'world'),
                   col_weird = list(list(12,23), list(23,24)))

library(dplyr)
weird_df %>%
  dplyr::mutate(tic = unlist(magrittr::extract2(col_weird, 1)),
                toc = unlist(magrittr::extract2(col_weird, 2)),
                col_weird = NULL)

With the last changes: Note that now col_weird contains list(NA, NA)

weird_df <- data_frame(col1 = c('hello', 'world', 'again'),
                  col_weird = list(list(12,23), list(23,24), list(NA, NA)))

library(dplyr)
weird_df %>%
 dplyr::mutate(col_weird = matrix(col_weird),
 tic = sapply(col_weird, function(x) magrittr::extract2(x, 1)),
 toc = sapply(col_weird, function(x) magrittr::extract2(x, 2)),
 col_weird = NULL)
Enrique Pérez Herrero
  • 3,699
  • 2
  • 32
  • 33
2

You can do this with basic R, thanks to I():

weird_df <- data.frame(col1 =c('hello', 'world'), 
   col_weird = I(list(list(12,23),list(23,24))))

weird_df
>    col1 col_weird
  1 hello    12, 23
  2 world    23, 24
psychOle
  • 1,054
  • 9
  • 19
2

Here is one option to do with purrr/tidyverse/reshape2. We unlist the 'col_weird' within map to get the output as list, set the names of the list with 'col1', melt to 'long' format, grouped by 'L1', create a 'rn' column and spread it back to 'wide'

library(tidyverse)
library(reshape2)
weird_df$col_weird %>%
     map(unlist) %>% 
     setNames(., weird_df$col1) %>%
     melt %>% 
     group_by(L1) %>%
     mutate(rn = c('tic', 'toc')[row_number()]) %>%
     spread(rn, value) %>%
     left_join(weird_df[-2], ., by = c(col1 = "L1"))
akrun
  • 874,273
  • 37
  • 540
  • 662
2

well, I came up with a simple one

> weird_df %>% 
+   rowwise() %>%
+   mutate(tic = col_weird[[1]],
+          tac = ifelse(length(col_weird) == 2, col_weird[[2]], NA)) %>% 
+   select(-col_weird) %>% ungroup()
# A tibble: 3 x 3
   col1   tic   tac
  <chr> <dbl> <dbl>
1 hello    12    23
2 world    23    24
3 again    NA    NA
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235