4

I have a list which I am turning into a dataframe. The list comes back from an API, and it contains some NULL values. There are questions on SO on this topic here and here, but they either deal with dataframes, or in the case of the second link, the OP was encouraged to transform to a dataframe first. I want to keep the list structure.

I'm parsing it in the following fashion, here is some example data:

example <- list(
  list(
    ID = "1",
    Name = "Joe",
    Middle_name = "Alan",
    Surname = "Smith"
  ),
  list(
    ID = "2",
    Name = "Sarah",
    Middle_name = NULL,
    Surname = "Jones"
  ),
  list(
    ID = "3",
    Name = "Robert",
    Middle_name = "Myles",
    Surname = "McDonnell"
  )
)

N <- NA_character_

df <- tibble::tibble(
  id = purrr::map_chr(example, .null = N, "ID"),
  name = purrr::map_chr(example, .null = N, "Name"),
  middle = purrr::map_chr(example, .null = N, "Middle_name"),
  surname = purrr::map_chr(example, .null = N, "Surname")
)


> df
# A tibble: 3 x 4
     id   name middle   surname
  <chr>  <chr>  <chr>     <chr>
1     1    Joe   <NA>     Smith
2     2  Sarah   <NA>     Jones
3     3 Robert   <NA> McDonnell

It appears this issue has some history in the purrr repo, but when I use purrr functions like is_empty() or compact(), I either get an error or it doesn't work.

Does anyone know how I could achieve this, preferably by keeping to the tibble & map_chr method I'm using above?

RobertMyles
  • 2,673
  • 3
  • 30
  • 45
  • Why do you say "I want to keep the list structure" when you really want to turn it into a data frame. You do want to turn it into a data frame yes? – Spacedman Jun 23 '17 at 20:58
  • Eventually, but I'd like to do it by using purrr on the list in the manner above. – RobertMyles Jun 23 '17 at 21:02
  • 1
    What about `map_df(transpose(example), ~map_chr(.,~ifelse(is.null(.),NA,.)))`? – HubertL Jun 23 '17 at 21:33
  • That definitely keeps it within the purrr framework, and works nicely (thanks!), but transposes it first, so I wouldn't be able to to use the tibble approach above. Best answer yet, though. – RobertMyles Jun 23 '17 at 22:08
  • I just updated to the development version of *purrr* and your example works just fine as is except for a small typo. In `example` you use "Middle_name" but in `map_chr` you refer to "Middle_Name". – aosmith Jun 23 '17 at 22:12
  • @aosmith Well, whaddya know!! I knew there was a purrr solution. Not quite the answer I was expecting, but if you write it up as answer, I'll happily accept it. Thanks, it works great, I just checked it. – RobertMyles Jun 23 '17 at 23:57
  • 5
    Doing it with N `map_chr` functions is iterating over the whole list N times. Doing it an element at a time, like with lapply and `rbind` iterates over the list once. A quick benchmark test shows me the `map_chr` approach is ten times slower. That's the price you pay for being "tidy". – Spacedman Jun 24 '17 at 07:26
  • 1
    @Spacedman thanks :-) – RobertMyles Jun 24 '17 at 11:10
  • 1
    @Spacedman Using `map_chr()` N times is equivalent to using `vapply()` N times. There's nothing tidyverse specific about it. Just benchmarked with `bind_rows()` and `flatten_chr()`, the "tidy" version is actually faster than the proposed base version. More to the point, we've been thinking about taking a column-spec approach for that kind of problems. It would bring the `readr` API to the task of creating data frames from lists and might be handy for heterogeneous lists of lists. – Lionel Henry Nov 13 '17 at 17:28
  • @lionel, thanks for the info. I find this approach really useful for dealing with messy internet data, badly-written APIs etc. Often I need to work with the list for a while before I turn everything into a dataframe. – RobertMyles Nov 13 '17 at 17:50

1 Answers1

6

You're example does work with the development version of purrr.

The NULL rows is causing problems for approaches, such as using dplyr::bind_rows, that would otherwise work to collapse a list of lists into a tibble. A work-around to remove the NULL row is to loop through and flatten each list. Looping via map_df binds the rows and gives your desired result.

map_df(example, flatten)

# A tibble: 3 x 4
     ID   Name Middle_name   Surname
  <chr>  <chr>       <chr>     <chr>
1     1    Joe        Alan     Smith
2     2  Sarah        <NA>     Jones
3     3 Robert       Myles McDonnell
aosmith
  • 34,856
  • 9
  • 84
  • 118