1

I have a dataframe nested within a dataframe that I'm getting from Mongo. The number of rows match in each so that when viewed it looks like a typical dataframe. My question, how do I expand the nested dataframe into the parent so that I can run dplyr selects? See the layout below

'data.frame':   10 obs. of  2 variables:
 $ _id         : int  1551 1033 1061 1262 1032 1896 1080 1099 1679 1690
 $ personalInfo:'data.frame':   10 obs. of  2 variables:
  ..$ FirstName     :List of 10
  .. ..$ : chr "Jack"
  .. ..$ : chr "Yogesh"
  .. ..$ : chr "Steven"
  .. ..$ : chr "Richard"
  .. ..$ : chr "Thomas"
  .. ..$ : chr "Craig"
  .. ..$ : chr "David"
  .. ..$ : chr "Aman"
  .. ..$ : chr "Frank"
  .. ..$ : chr "Robert"
  ..$ MiddleName    :List of 10
  .. ..$ : chr "B"
  .. ..$ : NULL
  .. ..$ : chr "J"
  .. ..$ : chr "I"
  .. ..$ : chr "E"
  .. ..$ : chr "A"
  .. ..$ : chr "R"
  .. ..$ : NULL
  .. ..$ : chr "J"
  .. ..$ : chr "E"

As per suggestion, here's how you recreate the data

id <- c(1551, 1033, 1061, 1262, 1032, 1896, 1080, 1099, 1679, 1690)
fname <- list("Jack","Yogesh","Steven","Richard","Thomas","Craig","David","Aman","Frank","Robert")
mname <- list("B",NULL,"J","I","E","A","R",NULL,"J","E")

sub <- as.data.frame(cbind(fname, mname))
master <- as.data.frame(id)
master$personalInfo <- sub
Jaap
  • 81,064
  • 34
  • 182
  • 193
Couch
  • 197
  • 1
  • 2
  • 10
  • 1
    It's easier to help if you shrare your data in a [reproducible format](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It's not easy to copy/paste the results of `str()` to test possible solutions. Also, what functions are you using to get this result? Are there no options in that function to create a data.frame? These lists with NULL values are going to be really messy to convert. – MrFlick Jun 15 '18 at 19:58
  • Not an exact duplicate, but this might help: https://stackoverflow.com/questions/47224831/using-tidyr-unnest-with-null-values – Cristian E. Nuno Jun 15 '18 at 20:03

2 Answers2

3

We could loop the 'personalInfo', change the NULL elements of the list to NA and convert it to a real dataset with 3 columns

library(tidyverse)
out <- master %>% 
        pull(personalInfo) %>% 
        map_df(~ map_chr(.x, ~ replace(.x, is.null(.x), NA))) %>% 
        bind_cols(master %>%
               select(id), .)
str(out)
#'data.frame':  10 obs. of  3 variables:
# $ id   : num  1551 1033 1061 1262 1032 ...
# $ fname: chr  "Jack" "Yogesh" "Steven" "Richard" ...
# $ mname: chr  "B" NA "J" "I" ...
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Both your answer and @Moody_Mudskipper 's worked great for me, thank you both for your help. I put the check on Moody's though b/c it was just a similar logic flow to how I think. Your's is much more concise and probably the better code, I just went more with thought process. You are free to curse my name. – Couch Jun 18 '18 at 17:27
1

While @akrun's answer is probably more practical and probably the way to tidy your data, I think this output is closer to what you describe.

I create a new environment where I put the data.frame's content, there I unlist to the said environment the content of your problematic column, and finally I wrap it all back into a data.frame.

I use a strange hack with cbind as as.data.frame is annoying with list columns. Using tibble::as_tibble works fine however.

new_env <- new.env()
list2env(master,new_env)
list2env(new_env$personalInfo,new_env)
rm(personalInfo,envir = new_env)
res <- as.data.frame(do.call(cbind,as.list(new_env))) # or as_tibble(as.list(new_env))
rm(new_env)
res
#      fname   id mname
# 1     Jack 1551     B
# 2   Yogesh 1033  NULL
# 3   Steven 1061     J
# 4  Richard 1262     I
# 5   Thomas 1032     E
# 6    Craig 1896     A
# 7    David 1080     R
# 8     Aman 1099  NULL
# 9    Frank 1679     J
# 10  Robert 1690     E

str(res)
# 'data.frame': 10 obs. of  3 variables:
#   $ fname:List of 10
# ..$ : chr "Jack"
# ..$ : chr "Yogesh"
# ..$ : chr "Steven"
# ..$ : chr "Richard"
# ..$ : chr "Thomas"
# ..$ : chr "Craig"
# ..$ : chr "David"
# ..$ : chr "Aman"
# ..$ : chr "Frank"
# ..$ : chr "Robert"
# $ id   :List of 10
# ..$ : num 1551
# ..$ : num 1033
# ..$ : num 1061
# ..$ : num 1262
# ..$ : num 1032
# ..$ : num 1896
# ..$ : num 1080
# ..$ : num 1099
# ..$ : num 1679
# ..$ : num 1690
# $ mname:List of 10
# ..$ : chr "B"
# ..$ : NULL
# ..$ : chr "J"
# ..$ : chr "I"
# ..$ : chr "E"
# ..$ : chr "A"
# ..$ : chr "R"
# ..$ : NULL
# ..$ : chr "J"
# ..$ : chr "E"
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167