2

I am using Tidyr to complete a time series for balances and transactions, however due to the number of individuals computation is taking a significant amount of time. I have 16 cores and R is only using one is there any way to parallelize Tidyr?

What I have is simply this (it is proceed by a group by such that the time series for each individual is completed respectively)

%>%tidyr::complete(Date = seq.Date(min(Date), max(Date), by="day"))%>%
Dominic Naimool
  • 313
  • 2
  • 11
  • 1
    Have you seen: https://www.business-science.io/code-tools/2016/12/18/multidplyr.html ? It looks promising – Annet Dec 13 '19 at 14:34
  • I have tried multidplyr however I have had consistent errors with the partition() function. df_mp<-data%>%multidplyr::partition(cluster = cl,id) # group by id Error in multidplyr::partition(., cluster = cl, id) : unused argument (id) – Dominic Naimool Dec 13 '19 at 15:05
  • I also receive this error Error in UseMethod("complete_") : no applicable method for 'complete_' applied to an object of class "multidplyr_party_df" – Dominic Naimool Dec 13 '19 at 15:13
  • Aren't you having memory issues because of the cartesian product you're doing here? Maybe the algorithm could be optimized – moodymudskipper Dec 13 '19 at 15:36
  • Processing power and memory aren't really an issue for me, the entire process takes 4 minuets but because of R's single core default I am only using 1 core when I have 16 – Dominic Naimool Dec 13 '19 at 15:58

0 Answers0