googledrive4: How to filter a dribble (drive-tibble), applying criteria to nested list in a tibble?

Question

Using googledrive4 package, accessing files on google drive is wonderfully easy.

p_load(googledrive4, tidyverse)
dribble1 <- drive_ls()

> dribble1
# A dribble: 2000 × 3
  name           id                                drive_resource   
  <chr>          <drv_id>                          <list>           
1 somefile1.zip  1w76E2ze0p00jtxxxxxxxxxxxxxxxxxxx <named list [37]>
2 somefile2.zip  1Zau_jwYlDHFK4xxxxxxxxxxxxxxxxxxx <named list [37]>
...

however, I am struggling to filter the result, based on parameters nested inside the "drive_resource" named list. I wish for example, to filter by date-time to create a dribble that contains only files that are saved after certain date.

in desperation and after many try-and-error, I achieved what I wanted, in 2 steps like this:

p_load(tidyverse, lubridate)

# 1 - unnest to make a list of ID's that match my criteria
range <- interval(as_date("2021/1/1", now())

filtered_list <- dribble1 %>%
  unnest_longer(col = drive_resource) %>%
  filter(drive_resource_id == "modifiedTime") %>%
  unnest_longer(drive_resource, values_to = "modtime") %>%
  mutate(modtime = as_datetime(modtime)) %>%
  filter(modtime %within% range)

# 2 - filter the original dribble with filtered list of ID's
result_dribble <- dribble1 %>%
  filter(id %in% filtered_list$id)

This works, but I feel there must be a better way to handle nested lists more elegantly, without creating intermediate objects.

Could someone please shed some light on this?

(sorry for the lack of reprex. dribbles are constructed in a unique way I do not yet fully understand, and couldn't use datapasta to reconstruct a dribble containing nested named list)

above is a simplified example, the data I am processing is much larger, I hope the screenshot from RStudio makes sense to people not familiar with googledrive4.

score 1 · Accepted Answer · answered Aug 12 '21 at 01:55

After a night of thinking, I found a solution in hoist()

dribble1 %>% 
  hoist(drive_resource, "modifiedTime") %>%
  mutate(modifiedTime = as_datetime(modifiedTime)) %>%
  filter(modifiedTime %within% range)

Now my code is a lot shorter and executes instantly!

as a side node, at first I struggled trying to use unnnest_wider, but was met with an error.

> dribble1 %>% unnest_wider(col = drive_resource) 
Error: Names must be unique.
x These names are duplicated:
  * "name" at locations 1 and 6.
  * "id" at locations 2 and 5.
ℹ Use argument `names_repair` to specify repair strategy.

this was easily fixed by supplying names_repair argument.

> dribble1 %>% unnest_wider(col = drive_resource, names_repair = "unique")

This was preferable than unnest_longer() as it preserves rows.

Thanks, I used this with the googledrive package...I had a recursive search of a directory and this made it very simple to filter the dribble to spreadsheets (using drive_resource "mimeType" where it equaled "application/vnd.google-apps.spreadsheet") and then applied another filter where name matched a pattern to get files I was looking for! — S Olson, Jun 30 '22 at 22:09

googledrive4: How to filter a dribble (drive-tibble), applying criteria to nested list in a tibble?

1 Answers1