Questions tagged [multidplyr]

multidplyr is an R package by Hadley Wickham that enables parallel processing on partitioned data.frames. This tag should not be used for dplyr-only questions.

multidplyr is an R package by Hadley Wickham that enables parallel processing on partitioned data.frames. It is a complement to his popular dplyr package and part of the extended tidyverse ecosystem of packages.

51 questions

votes

1 answer

Parallel computing, which alternative to tidyr::complete in dplyr?

I am trying to parallelise a pipe. In the pipe there is a tidyr command ("tidyr::complete"). This breaks down the code once run in parallel, as the object class is not recognised. Is there an alternative in dplyr to…

r dplyr parallel-processing multidplyr

asked Jun 24 '20 at 11:17

MCS

1,071
9
23

votes

1 answer

Replacement for parallel plyr with doMC

Consider a standard grouped operation on a data.frame: library(plyr) library(doMC) library(MASS) # for example nc <- 12 registerDoMC(nc) d <- data.frame(x = c("data", "more data"), g = c("group1", "group2")) y <- "some global object" res <-…

r dplyr plyr tidyverse multidplyr

asked Dec 01 '17 at 16:42

Devin

votes

1 answer

multidplyr and group_by () and filter()

I have the following dataframe and my intention is to find all the IDs, that have different USAGE but the same TYPE. ID <- rep(1:4, each=3) USAGE <-…

r dplyr multidplyr

asked Jul 30 '17 at 11:35

Justas Mundeikis

votes

1 answer

Calling a function with arguments within dplyr::do using multidplyr

I'm trying to use multidplyr to speed up getting residuals from a regression fit. I've created a function that fits the regression model to get the residuals, which in addition to the data, gets two more arguments. Here's the function: func <-…

r arguments dplyr multidplyr

asked Nov 08 '17 at 19:19

dan

6,048
10
57
125

votes

1 answer

How to export custom functions to clusters in multidplyr?

Following on from questions here and here, I'm trying to get the latest version of multidplyr to work with a custom function. By way of reproducible example, I have tried: library(multidplyr) library(dplyr) cl <- new_cluster(3) df <- data.frame(Grp…

r dplyr multidplyr

asked Feb 01 '20 at 06:47

Will T-E

votes

1 answer

R: What is a fast way to remove dominated rows from a table?

I'm looking for a fast way to remove all dominated rows from a table (preferably using parallel processing, to take advantage of multiple cores). By "dominated row", I mean a row that is less than or equal to another row in all columns. For example,…

r dplyr multidplyr

asked Jun 19 '18 at 21:04

kartik_subbarao

votes

1 answer

how to split by multiple columns when using multidplyr

tl;dr How do I make "partition" from multiplyr split on multiple columns? Motivation: I was unhappy with using 1 of 32 cores for hard-working summarize, so I am trying to use multi-dplyer I am operating on multiple columns. Example: The vignette…

r dplyr multidplyr

asked Dec 21 '17 at 16:21

EngrStudent

1,924
31
46

votes

1 answer

multidplyr : assign functions to cluster

(see working solution below) I want to use multidplyr to parallelize a function : calculs.R f <- function(x){ return(x+1) } main.R library(dplyr) library(multidplyr) source("calculs.R") d <- data.frame(a=1:1000,b=sample(1:2,1000),replace=T) result…

r parallel-processing dplyr multidplyr

asked Oct 03 '17 at 21:27

Xavier Prudent

1,570
3
25
54

votes

0 answers

Multiplyr and prophet for parallel grouped prediction: Error in checkForRemoteErrors(lapply(cl, recvResult))

I am willing to make parallel predictions using multidplyr and prophet. Consider the following data library(tidyr) library(dplyr) library(multidplyr) library(prophet) ds = as.Date(c('2016-11-01', '2016-11-02', '2016-11-03', '2016-11-04', …

r parallel-processing dplyr multidplyr

asked Jul 22 '17 at 20:43

Eduardo

4,282
2
49
63

votes

2 answers

Creating a frequency 2x2 table in R but replacing frequency data with numerical data from another variable

I am having trouble to create a table in a format required to run some analyses. Here is a simplified example of how my large dataset looks like Sample <- c(1,2,2,3,3) Species <- c("sp1","sp2","sp3","sp1","sp1") Counts <-…

r dplyr aggregate multidplyr

asked Jul 04 '23 at 07:15

Sergio Nolazco

votes

0 answers

parallelise group_walk operation with multidplyr

Is it possible to parallelise a dplyr::group_walk operation on grouped data using multidplyr? In this first attempt at a general question I won't provide a reprex, but if it helps I can. I have multiple time series for many individuals and I would…

r tidyverse multidplyr

asked Jun 18 '20 at 14:29

mjrolland

votes

0 answers

Can you parallelize panel maneuvers in R?

In my R script, I'm using the pmdplyr functions mutate_cascade() and tlag() to mutate my data, which contains over 3 million records, so the code is extremely slow but it works. In order to speed things up, I tried adding the parallel processing…

r dplyr multidplyr

asked May 28 '20 at 16:55

Tess

votes

1 answer

How to join, group and summarise large dataframes in R with multidplyr and parallel

This question is similar to other problems with very large data in R, but I can't find an example of how to merge/join and then perform calculations on two dfs (as opposed to reading in lots of dataframes and using mclapply to do the calculations).…

r parallel-processing left-join multidplyr

asked Mar 17 '20 at 03:53

leslie roberson

votes

0 answers

Is there a way to parallelize tidyr?

I am using Tidyr to complete a time series for balances and transactions, however due to the number of individuals computation is taking a significant amount of time. I have 16 cores and R is only using one is there any way to parallelize Tidyr? …

r parallel-processing dplyr tidyr multidplyr

asked Dec 13 '19 at 14:32

Dominic Naimool

votes

1 answer

R: Why parallel is (much) slower? What is best strategy in using parallel for a (left) join a large collection of big files?

I've read some questions on the subjects as well as some tutorials but failed to resolve my problem so decided to ask myself. I have a large collection of big files of types say A, B, C; and I need to left join B, C with A on some conditions. I work…

r foreach parallel-processing multidplyr

asked Apr 10 '19 at 20:39

Evgeny

2 3 4 Next