38

I'm trying to find the dplyr function for cartesian product. I've two simple data.frame with no common variable:

x <- data.frame(x = c("a", "b", "c"))
y <- data.frame(y = c(1, 2, 3))

I would like to reproduce the result of

merge(x, y)

  x y
1 a 1
2 b 1
3 c 1
4 a 2
5 b 2
6 c 2
7 a 3
8 b 3
9 c 3

I've already looked for this (for example here or here) without finding anything useful.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Luca Monno
  • 830
  • 1
  • 11
  • 25

7 Answers7

52

Use crossing from the tidyr package:

x <- data.frame(x=c("a","b","c"))
y <- data.frame(y=c(1,2,3))

crossing(x, y)

Result:

   x y
 1 a 1
 2 a 2
 3 a 3
 4 b 1
 5 b 2
 6 b 3
 7 c 1
 8 c 2
 9 c 3
Gregor Sturm
  • 2,792
  • 1
  • 25
  • 34
10

When x and y are database tbls (tbl_dbi / tbl_sql) you can now also do:

full_join(x, y, by = character())

Added to dplyr at the end of 2017, and also gets translated to a CROSS JOIN in the DB world. Saves the nastiness of having to introduce the fake variables.

I'm seeing comments now (Nov2022) that this does also work on standard dataframes! Great news!

dsz
  • 4,542
  • 39
  • 35
  • 3
    I'm running with dplyr 0.7.6 and this gives the error `Error in full_join_impl(x, y, by_x, by_y, aux_x, aux_y, na_matches) : by must specify variables to join by` for the given example. Any ideas? – Dan Aug 29 '18 at 17:29
  • Same error as @Lyngbakr with dplyr 0.7.8. Did this ever appear in dplyr? Absent this, I use `crossing()`, as in the accepted answer. – banbh Feb 25 '19 at 14:39
  • @andyyy What was the underlying type when this error occur? I've seen odd and different behaviours especially on data.tables. – dsz Feb 26 '19 at 21:41
  • 1
    Works fine with dataframes. – zx8754 Nov 25 '22 at 09:13
6

If we need a tidyverse output, we can use expand from tidyr

library(tidyverse)
y %>% 
   expand(y, x= x$x) %>%
   select(x,y)
# A tibble: 9 × 2
#       x     y
#  <fctr> <dbl>
#1      a     1
#2      b     1
#3      c     1
#4      a     2
#5      b     2
#6      c     2
#7      a     3
#8      b     3
#9      c     3
akrun
  • 874,273
  • 37
  • 540
  • 662
5

When faced with this problem, I tend to do something like this:

x <- data.frame(x=c("a","b","c"))
y <- data.frame(y=c(1,2,3))
x %>% mutate(temp=1) %>% 
inner_join(y %>% mutate(temp=1),by="temp") %>%
dplyr::select(-temp) 

If x and y are multi-column data frames, but I want to do every combination of a row of x with a row of y, then this is neater than any expand.grid() option that I can come up with

andyyy
  • 986
  • 8
  • 8
4
expand.grid(x=c("a","b","c"),y=c(1,2,3))

Edit: Consider also this following elegant solution from "Y T" for n more complex data.frame :

https://stackoverflow.com/a/21911221/5350791

in short:

expand.grid.df <- function(...) Reduce(function(...) merge(..., by=NULL), list(...))
expand.grid.df(df1, df2, df3)
markus
  • 25,843
  • 5
  • 39
  • 58
  • There is also [expand_grid](https://tidyr.tidyverse.org/reference/expand_grid.html) in package tidyr. –  Dec 05 '20 at 11:51
4

This is a continuation of dsz's comment. Idea came from: http://jarrettmeyer.com/2018/07/10/cross-join-dplyr.

tbl_1$fake <- 1
tbl_2$fake <- 1
my_cross_join <- full_join(tbl_1, tbl_2, by = "fake") %>%
                 select(-fake)

I tested this on four columns of data ranging in size from 4 to 640 obs, and it took about 1.08 seconds.

Aaron C
  • 301
  • 1
  • 8
2

Using two answers above, using full_join() with by = character() seems to be faster:

library(tidyverse)
library(microbenchmark)

df <- data.frame(blah = 1:10)

microbenchmark(diamonds %>% crossing(df))
Unit: milliseconds
                      expr      min       lq     mean   median       uq     max neval
 diamonds %>% crossing(df) 21.70086 22.63943 23.72622 23.01447 24.25333 30.3367   100
microbenchmark(diamonds %>% full_join(df, by = character()))
Unit: milliseconds
                                         expr      min       lq     mean   median       uq      max neval
 diamonds %>% full_join(df, by = character()) 9.814783 10.23155 10.76592 10.44343 11.18464 15.71868   100
Doug Fir
  • 19,971
  • 47
  • 169
  • 299