0

I am trying to convert a nested list (video_details_t) into a data frame. Most of the information in the nested list shouldn't be in the final data frame, just "tags" (and ideally "id"). The nested list has 252 elements and each element is structured like so:

    $ :List of 4
    ..$ kind    : chr "youtube#videoListResponse"
    ..$ etag    : chr "\"Fznwjl6JEQdo1MGvHOGaz_YanRU/wjb97SA5L1u9pjKF_Wa4GYuJoks\""
    ..$ pageInfo:List of 2
    .. ..$ totalResults  : int 1
    .. ..$ resultsPerPage: int 1
    ..$ items   :List of 1
    .. ..$ :List of 4
    .. .. ..$ kind   : chr "youtube#video"
    .. .. ..$ etag   : chr "\"Fznwjl6JEQdo1MGvHOGaz_YanRU/fJEMmhh4c330M-HX-dZXcMUN_R0\""
    .. .. ..$ id     : chr "Dod4hirL4IU"
    .. .. ..$ snippet:List of 10
    .. .. .. ..$ publishedAt         : chr "2019-11-02T13:00:04.000Z"
    .. .. .. ..$ channelId           : chr "UCa92M881KJO0FqaOUb4xAqg"
    .. .. .. ..$ title               : chr "Making Hydrogen from Water (Ft: The DIY Science Guy)"
    .. .. .. ..$ description         : chr "In which JB attempts to make an electrolytic cell for making hydrogen gas after being inspired by The DIY Scien"| __truncated__
    .. .. .. ..$ thumbnails          :List of 5
    .. .. .. .. ..$ default :List of 3
    .. .. .. .. .. ..$ url   : chr "https://i.ytimg.com/vi/Dod4hirL4IU/default.jpg"
    .. .. .. .. .. ..$ width : int 120
    .. .. .. .. .. ..$ height: int 90
    .. .. .. .. ..$ medium  :List of 3
    .. .. .. .. .. ..$ url   : chr "https://i.ytimg.com/vi/Dod4hirL4IU/mqdefault.jpg"
    .. .. .. .. .. ..$ width : int 320
    .. .. .. .. .. ..$ height: int 180
    .. .. .. .. ..$ high    :List of 3
    .. .. .. .. .. ..$ url   : chr "https://i.ytimg.com/vi/Dod4hirL4IU/hqdefault.jpg"
    .. .. .. .. .. ..$ width : int 480
    .. .. .. .. .. ..$ height: int 360
    .. .. .. .. ..$ standard:List of 3
    .. .. .. .. .. ..$ url   : chr "https://i.ytimg.com/vi/Dod4hirL4IU/sddefault.jpg"
    .. .. .. .. .. ..$ width : int 640
    .. .. .. .. .. ..$ height: int 480
    .. .. .. .. ..$ maxres  :List of 3
    .. .. .. .. .. ..$ url   : chr "https://i.ytimg.com/vi/Dod4hirL4IU/maxresdefault.jpg"
    .. .. .. .. .. ..$ width : int 1280
    .. .. .. .. .. ..$ height: int 720
    .. .. .. ..$ channelTitle        : chr "Good and Basic"
    .. .. .. ..$ tags                :List of 8
    .. .. .. .. ..$ : chr "DIY"
    .. .. .. .. ..$ : chr "diyscienceguy"
    .. .. .. .. ..$ : chr "diy science guy"
    .. .. .. .. ..$ : chr "hydrogen electrolysis"
    .. .. .. .. ..$ : chr "water splitting"
    .. .. .. .. ..$ : chr "hydrogen generator"
    .. .. .. .. ..$ : chr "Good and basic"
    .. .. .. .. ..$ : chr "splitting molecules"
    .. .. .. ..$ categoryId          : chr "22"
    .. .. .. ..$ liveBroadcastContent: chr "none"
    .. .. .. ..$ localized           :List of 2
    .. .. .. .. ..$ title      : chr "Making Hydrogen from Water (Ft: The DIY Science Guy)"
    .. .. .. .. ..$ description: chr "In which JB attempts to make an electrolytic cell for making hydrogen gas after being inspired by The DIY Scien"| __truncated__

What the final output should be is a data frame with 252 rows (one for each of the 252 elements of video_tags_t) and a column for each unique "tag" entry across all 252 elements. Here's what I've entered so far:

    just_tags <- map(map(map(video_details_t, "items") %>%
      flatten(), "snippet"), "tags")

This gets me a nested list with 252 elements and each element is a vector containing all the tags. So far so good. Next I use the following to convert it to a data frame:

    df<- rbind_all(lapply(just_tags, data.frame))

This gives me a data frame with 2165 columns, one for every tag, exactly what I want. But the data frame only has 238 rows when it should have 252 (one for every element of just_tags). What is going on here? Is it deleting duplicate rows during the conversion?

I also get the following output:

    Warning messages:
    1: 'rbind_all' is deprecated.
    Use 'bind_rows()' instead.
    See help("Deprecated") 
    2: In bind_rows_(x, id = id) :
      Unequal factor levels: coercing to character
    3: In bind_rows_(x, id = id) :
      binding character and factor vector, coercing into character vector
    4: In bind_rows_(x, id = id) :
      binding character and factor vector, coercing into character vector
    5: In bind_rows_(x, id = id) :
      binding character and factor vector, coercing into character vector

I'm assuming those don't matter for the output, since I think they're just converting the "tags" elements into characters instead of factors.

If the conversion is deleting duplicate rows, is there a way to preserve them, say, by identifying each row with the "id" element from the original list? Each of the 252 elements has exactly one "id" element and it's unique so it could be used to delineate each of the 252 final output rows in the data frame.

Thanks so much for your help and please let me know if I can make something clearer!

  • It is a warning message to use `bind_rows` instead of `rbind_all` or you can use `map_dfr` `map_dfr(just_tags, data.frame)` The second warning is the `factor` columns which would result in coercing to `character` class if there are unequal `factor` levels – akrun Jan 21 '20 at 19:26
  • 2
    It is very difficult to help with specifics without a reproducible example, see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Axeman Jan 21 '20 at 19:30
  • Probably replacing `df <- rbind_all(lapply(just_tags, data.frame))` with `df <- map_dfr(just_tags, as_tibble)` will get rid of those warnings. – Axeman Jan 21 '20 at 19:39
  • I doubt it is removing duplicate rows, but it may be that some of you 252 elements don't have that tag. In which case I'm guessing your `map(map(map` will return `NULL`. Then you will have no row after combining the results. See the examples of `?map`, and specifically the `.default` argument to see how you can change those `NULL` values to `NA` instead. – Axeman Jan 21 '20 at 19:43
  • @Axeman that was it. I thought they all had the "tags" element but they didn't. The resulting data.frame after implementing your suggestion about the '.default' argument has one more column than it should, but I'm assuming that's the NA entries (even though I can't find them). In any case, it's an acceptable level of precision. Thanks for your help, and the link to reproducible example guidelines. (As a side note, 'map_dfr' gave an error, but it doesn't matter because I got pretty much what I wanted.) – Joseph Fisher Jan 21 '20 at 20:43

1 Answers1

0

Axeman gave the correct solution, as far as I can tell.

    just_tags <- map(map(map(video_details_t, "items") %>%
          flatten(), "snippet"), "tags")

needed to be modified with a '.default = NA' argument at each stage the the 'map()' function, like so:

    just_tags <- map(map(map(video_details_t, "items", .default = NA) %>%
      flatten(), "snippet", .default = NA), "tags", .default = NA)

After this, running

        df<- rbind_all(lapply(just_tags, data.frame))

gave the desired data frame with 252 rows. It had an extra column (2166 instead of 2165) but I'm assuming that's the NA entries...? In any case, the problem is solved as far as I'm concerned.

  • Glad it got solved. Again, please consider replacing your `rbind_all` code with either `bind_rows` directly, or a `map_dfr` variant as indicated in the comments. `rbind_all` is deprecated (as the warning is telling you) and will dissappear at some point. In the future, please provide a reproducible example! – Axeman Jan 21 '20 at 21:00