3

tl;dr
How do I make "partition" from multiplyr split on multiple columns?

Motivation:
I was unhappy with using 1 of 32 cores for hard-working summarize, so I am trying to use multi-dplyer I am operating on multiple columns.

Example:
The vignette shows grouping by a single column, but when I do that, my other grouping column is not considered.

Code:

library(dplyr)
library(multidplyr)
library(nycflights13)

flights1 <- partition(flights, flight)
flights2 <- summarise(flights1, dep_delay = mean(dep_delay, na.rm = TRUE))
flights3 <- collect(flights2)

So how about splitting on year, month, and day?

This doesn't work for me:

flights1 <- partition(flights, list(year, month, day))
flights2 <- summarise(flights1, dep_delay = mean(dep_delay, na.rm = TRUE))
flights3 <- collect(flights2)

I can't seem to make this work. Can you point to a proper or at least effective way to do this?

gsquaredxc
  • 1,024
  • 12
  • 28
EngrStudent
  • 1,924
  • 31
  • 46

1 Answers1

1

According to ?partition, the usage for partition is

partition(.data, ..., cluster = get_default_cluster())

where ... are variables to partition by. Instead of passing in a list of variables, pass in each variable separately, i.e.

partition(flights, year, month, day)
Weihuang Wong
  • 12,868
  • 2
  • 27
  • 48