0

I have a list called list that looks as follows:

Col1  Col2   Col3                    Col4        Col5   ...
1     Name1  <data.frame [1 × 3]>    <chr [0]>   <list [0]>
2     Name2  <data.frame [29 × 3]>   <chr [1]>   <data.frame [1 × 9]>
3     Name3  <data.frame [5 × 3]>    <chr [1]>   <NULL>
...

I want to clean up this list, turn it into a dataframe and make new columns out of the nested dataframes, lists and characters but I am not sure what is contained in them and what they look like. So my output should essentially create new columns behind Col3 for the respective data that is contained in Col3 and so on.

I am not sure how to accomplish this or how to build a reference of this list to attach here for people to try out. Grateful for any tips!

EDIT: Here is the dput for the first 2 rows (sanitized due to personal information)

structure(list(about = c(NA_character_, NA_character_), avatar = c("LINK", 
"LINK2"
), avatar_cached = c(NA_character_, NA_character_), certifications = list(
    structure(list(meta = "Issued Apr 2009", subtitle = "The Prince2 Academy", 
        title = "Prince2"), class = "data.frame", row.names = 1L), 
    structure(list(), names = character(0), row.names = integer(0), class = "data.frame")), 
    city = c("Ireland", "Greater Sydney Area"), country_code = c("IE", 
    "AU"), courses = list(structure(list(subtitle = "-", title = "TITLE"), class = "data.frame", row.names = 1L), 
        structure(list(), names = character(0), row.names = integer(0), class = "data.frame")), 
    current_company = structure(list(name = c("", "COMPANY1"
    ), company_id = c(NA, "company1"), industry = c(NA, "Capital Markets"
    ), link = c(NA, "COMPANYURL"
    )), row.names = 1:2, class = "data.frame"), `current_company:name` = c("", 
    "COMPANY1"), education = list(structure(list(), names = character(0), row.names = integer(0), class = "data.frame"), 
        structure(list(), names = character(0), row.names = integer(0), class = "data.frame")), 
    educations_details = list(character(0), character(0)), experience = list(
        structure(list(), names = character(0), row.names = integer(0), class = "data.frame"), 
        structure(list(company = "COMPANY1", company_id = "company1", 
            industry = "Capital Markets", location = "", positions = list(
                structure(list(description = "", duration = "Jun 2007 - Present 14 years 4 months", 
                  duration_short = "14 years 4 months", end_date = "Present", 
                  start_date = "Jun 2007", subtitle = "Company1", 
                  title = "TITLE1"), class = "data.frame", row.names = 1L)), 
            url = "URL"), class = "data.frame", row.names = 1L)), 
    following = c(500L, 1L), groups = list(structure(list(), names = character(0), row.names = integer(0), class = "data.frame"), 
        structure(list(), names = character(0), row.names = integer(0), class = "data.frame")), 
    id = c("ID1", "ID2"
    ), languages = list(structure(list(subtitle = "-", title = "French"), class = "data.frame", row.names = 1L), 
        structure(list(), names = character(0), row.names = integer(0), class = "data.frame")), 
    name = c("Name1", "Name2"), people_also_viewed = list(
        structure(list(), names = character(0), row.names = integer(0), class = "data.frame"), 
        structure(list(), names = character(0), row.names = integer(0), class = "data.frame")), 
    position = c(NA, "Position1"
    ), posts = list(structure(list(attribution = c("", "", ""
    ), title = c("", "", "")), class = "data.frame", row.names = c(NA, 
    3L)), structure(list(attribution = "Liked by x", 
        img = "URL", 
        link = "URL", 
        title = "TITLE"), class = "data.frame", row.names = 1L)), 
    recommendations = list("“Rec”", 
        list()), recommendations_count = c(1L, NA), region = c("EU", 
    "OC"), timestamp = c("2021-07-09", "2021-09-09"), url = c("URL1", 
    "URL2"
    ), volunteer_experience = list(list(), list()), changelog = list(
        NULL, list()), `current_company:company_id` = c(NA, "company1"
    ), `current_company:industry` = c(NA, "Capital Markets")), row.names = 1:2, class = "data.frame")
Soph2010
  • 563
  • 3
  • 13
  • Can you share the code to create your dataframe? you can do `dput(head(your_df))` – Maël Jan 24 '23 at 10:34
  • I added the dput for the first 2 rows (sanitized) - does that help? @Maël – Soph2010 Jan 24 '23 at 10:52
  • To have more chance to get people answer, you should provide a minimal example, namely one that reproduces your problem but with minimum data frame. I'll advise to take a look at this post: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. You could also show what is your expected output. – Maël Jan 24 '23 at 10:54
  • Thank you @Maël - is the dput I provided sufficient? – Soph2010 Jan 24 '23 at 11:55

1 Answers1

0

The map_df() function from the purrr library will work. Use the map_df() function for mapping the function to every element of the list, then binding the results into a single data Frame-something like this:

library(purrr)
# This is afunction to extract the columns from the nested data frame
## Checking if x is a df and extracts the columns
extract_cols <- function(x) {
  if (is.data.frame(x)) {
    return(x)
  }
}

# Apply the function to each element of the list
df <- map_df(list, extract_cols, .id = "id")
micahondiwa
  • 60
  • 1
  • 8