I am trying to implement the example given here: https://cran.r-project.org/web/packages/multidplyr/vignettes/multidplyr.html
I however get the following error when I get to the point where I need to partition the data using ether method 1 or 2. I have tried re-installing Rcpp package and still doesn't work.
Error in qs::qsave(values, path, preset = "fast", check_hash = FALSE, : function 'Rcpp_precious_remove' not provided by package 'Rcpp'
Below is the code sample:
library(multidplyr)
library(dplyr, warn.conflicts = FALSE)
library(nycflights13)
###Creating a cluster
cluster <- new_cluster(2)
####Method 1. Add dataPartition not working. Investigate why. Use direct method instead
# flights1 <- flights %>% group_by(dest) %>% partition(cluster)
# Method 2 To show how that might work, I’ll first split flights up by month and save as csv files:
path <- tempfile()
dir.create(path)
flights %>%
group_by(month) %>%
group_walk(~ vroom::vroom_write(.x, sprintf("%s/month-%02i.csv", path, .y$month)))
# Now we find all the files in the directory, and divide them up so that each worker gets (approximately) the same number of pieces:
files <- dir(path, full.names = TRUE)
cluster_assign_partition(cluster, files = files)
# Then we read in the files on each worker and use party_df() to create a partitioned dataframe:
cluster_send(cluster, flights2 <- vroom::vroom(files))
flights2 <- party_df(cluster, "flights2")
###dplyr verbs.
df <- flights1 %>%
summarise(dep_delay = mean(dep_delay, na.rm = TRUE)) %>%
collect()