Generate all possible combinations for Columns(cross join or Cartesian product)

Question

I have a Google SpreadSheets doc with three columns A, B and C.

I need to populate the Column C with all the possible combinations of the values in Columns A and B. Please take a look a the capture to see what I mean.

I found this to be done in Excel, here, but it doesn't work in google spreadsheets.

The formula should be useful even for more columns (e.g.: four instead of two)

Can I do this?

player0 · Answer 1 · 2022-10-22T19:42:05.023

44

in post-pandemic new world we can solve this with:

=INDEX(FLATTEN(A2:A3&" "&TRANSPOSE(B2:B4)))

to account for future expansion we can do:

=INDEX(FLATTEN(FILTER(A2:A; A2:A<>"")&" "&TRANSPOSE(FILTER(B2:B; B2:B<>""))))

for 3 columns:

=INDEX(FLATTEN(FLATTEN(
 FILTER(A2:A; A2:A<>"")&" "&TRANSPOSE(
 FILTER(B2:B; B2:B<>"")))&" "&TRANSPOSE(
 FILTER(C2:C; C2:C<>""))))

4 columns:

=INDEX(FLATTEN(FLATTEN(FLATTEN(
 FILTER(A2:A; A2:A<>"")&" "&TRANSPOSE(
 FILTER(B2:B; B2:B<>"")))&" "&TRANSPOSE(
 FILTER(C2:C; C2:C<>"")))&" "&TRANSPOSE(
 FILTER(D2:D; D2:D<>""))))

for more see: https://stackoverflow.com/a/74160711/5632629

edited Oct 22 '22 at 19:42

answered Aug 13 '21 at 16:46

player0

124,011
12
67
124

2

Awesome! And to print them in separate columns instead of one follow the example here: =ArrayFormula(split(flatten(transpose(filter(B2:B; B2:B <> "")) & "," & (flatten(filter(C2:C; C2:C <> "") & "," & transpose(filter(D2:D; D2:D <> ""))))); ",")) https://support.google.com/docs/thread/61770165/create-automatically-all-combinations-of-multiple-columns-each-one-of-different-size?hl=en – NicolasZ Feb 24 '22 at 16:14

score 25 · Accepted Answer · edited Aug 13 '21 at 16:52

25

Update 201810

Original formula crashes for a big dataset. I described a way to make cross-join with any size of data here.

Try formula:

=ArrayFormula(transpose(split(rept(concatenate(A2:A&char(9)),counta(B2:B)),char(9)))
 &" "&transpose(split(concatenate(rept(B2:B&char(9),counta(A2:A))),char(9))))

The result:

car red
train red
car yellow
train yellow
car blue
train blue

You may use it again to add another list:

The formula is in cells C2 and E2,

C2 is:

=ArrayFormula(transpose(split(rept(concatenate(A2:A&char(9)),counta(B2:B)),char(9)))&" "&transpose(split(concatenate(rept(B2:B&char(9),counta(A2:A))),char(9))) )

and E2 is:

=ArrayFormula(transpose(split(rept(concatenate(C2:C&char(9)),counta(D2:D)),char(9)))&" "&transpose(split(concatenate(rept(D2:D&char(9),counta(C2:C))),char(9))) )

edited Aug 13 '21 at 16:52

player0

124,011
12
67
124

answered Mar 15 '17 at 09:48

Max Makhrov

17,309
5
55
81

Should I insert this formula in a single cell? '=ArrayFormula(transpose(split(rept(concatenate(A2:A&char(9)),counta(B2:B)),char(9)))&""&transpose(split(concatenate(rept(B2:B&char(9),counta(A2:A))),char(9))))' This is what I did but I get an error message. Why did you split the formula in two parts? – JPashs Mar 15 '17 at 09:56
Sorry but I can't get it: here the spreadsheet (you can edit), can you paste your formula there: https://docs.google.com/spreadsheets/d/1Z80MNKML4m7YKXn2SfHVZwO0mPs889MX0DXfk6iKx5k/edit?usp=sharing – JPashs Mar 15 '17 at 10:01
2

You have to replace commas by semicolon `;` in formulas. It is regional settings. I changed file settings region to US and it worked. – Max Makhrov Mar 15 '17 at 10:25

score 4 · Answer 3 · answered Dec 24 '22 at 10:05

Now that we have LAMBDA and LHFs, we can solve this problem nicely using REDUCE.

2 columns:

=ARRAYFORMULA(SPLIT(REDUCE(,SUBSTITUTE(SUBSTITUTE(TRIM(QUERY(
SUBSTITUTE(A2:B," ",CHAR(9)),,9^9))," ",CHAR(127)),CHAR(9)," "),
LAMBDA(a,c,FLATTEN(a&CHAR(127)&SPLIT(c,CHAR(127))))),CHAR(127)))

3 columns:

=ARRAYFORMULA(SPLIT(REDUCE(,SUBSTITUTE(SUBSTITUTE(TRIM(QUERY(
SUBSTITUTE(A2:C," ",CHAR(9)),,9^9))," ",CHAR(127)),CHAR(9)," "),
LAMBDA(a,c,FLATTEN(a&CHAR(127)&SPLIT(c,CHAR(127))))),CHAR(127)))

4 columns:

=ARRAYFORMULA(SPLIT(REDUCE(,SUBSTITUTE(SUBSTITUTE(TRIM(QUERY(
SUBSTITUTE(A2:D," ",CHAR(9)),,9^9))," ",CHAR(127)),CHAR(9)," "),
LAMBDA(a,c,FLATTEN(a&CHAR(127)&SPLIT(c,CHAR(127))))),CHAR(127)))

And so on...

kishkin · Answer 4 · 2022-12-25T16:29:39.527

2

Here's another solution.

A named function CARTESIAN_PRODUCT:

=IF(COLUMNS(range) = 1, IFNA(FILTER(range, range <> "")), LAMBDA(sub_product, last_col, REDUCE(, SEQUENCE(ROWS(sub_product)), LAMBDA(acc, cur, LAMBDA(new_range, IF(cur = 1, new_range, {acc; new_range}))({ARRAYFORMULA(IF(SEQUENCE(ROWS(last_col)), INDEX(sub_product, cur,))), last_col}))))(CARTESIAN_PRODUCT(ARRAY_CONSTRAIN(range, ROWS(range), COLUMNS(range) - 1)), LAMBDA(r, IFNA(FILTER(r, r <> "")))(INDEX(range,, COLUMNS(range)))))

(in a little bit more readable formatting)

The only argument is range which specifies the columns with the values. Empty cells will be ignored.

It uses recursion and does not use query smashing or string splitting. It works with any number of columns.

edited Dec 25 '22 at 16:29

answered Dec 25 '22 at 16:20

kishkin

5,152
1
26
40

When trying to use that, I get a syntax error. For some reason in my case the first `range` param in `FILTER` isn't recognized as the named argument (and coloured orange), even though the second one is. Weird. If it's not a problem, could you check if the line that you pasted here still works in the current GSHeets for you? – JoannaFalkowska Jul 04 '23 at 11:29
@JoannaFalkowska checked, it works without any change. Maybe you have language settings other than US, so parameter delimiter in functions is `;` instead of `,`? – kishkin Jul 19 '23 at 08:41

score 1 · Answer 5 · answered Aug 29 '22 at 03:57

Just to expand on @nicolasZ's comment for those (like myself) not so familiar with sheets syntax:

If you want to create the combinations but split into unique columns you can follow a very similar patter to @player0's answer but wrapping everything in ARRAYFORMULA(SPLIT( ...)," ".

What this does is split the result in a single column by a space (" ") and then distribute this into the next N columns.

Explicitly, to combine 5 columns of data I used:

= ARRAYFORMULA(SPLIT( FLATTEN(FLATTEN(FLATTEN(FLATTEN(
 FILTER(B2:B, B2:B<>"")&" "&TRANSPOSE(
 FILTER(C2:C, C2:C<>"")))&" "&TRANSPOSE(
 FILTER(D2:D, D2:D<>"")))&" "&TRANSPOSE(
 FILTER(E2:E, E2:E<>"")))&" "&TRANSPOSE(
 FILTER(F2:F, F2:F<>"")))," "))

To add more rows you simply have to add another Flatten( to the beginning of the command and then insert &" "&TRANSPOSE( FILTER(F2:F, F2:F<>""))) at the end (but before the " ")).

Note that there must be enough empty column/rows to expand the formula or the command will fail to evaluate.

score 1 · Answer 6 · answered May 13 '23 at 12:10

Here's one more approach to add to the collection:

=let(range,A:D,data,filter(range,bycol(range,lambda(Σ,counta(Σ)))<>0),
     count,bycol(data,lambda(Σ,counta(Σ))), column,sequence(1,columns(data),1),
     first,tocol(map(tocol(choosecols(data,1),1),lambda(Σ,wraprows(Σ,product(ifna(filter(count,column>1),1)),Σ)))),
     if(max(column)=1,first,reduce(first,sequence(1,max(column)-1,2,1),lambda(a,c,{a,
           tocol(map(tocol(map(tocol(choosecols(data,c),1),lambda(Σ,wraprows(Σ,product(ifna(filter(count,column>c),1)),Σ)))),lambda(Σ,wraprows(Σ,product(filter(count,column<c)),Σ))),,1)}))))

No change in formula needed even when the data is in just 3 columns (and non-adjacent too)

2 columns with still the same formula

An extension to this formula can be used to create unique pair combinations (as described in this question here) while excluding\limiting the duplicates (1,1 | 2,2 | 3,3.... and 1,2 | 2,1 | 1,3 | 3,1....)

=let(range,index(query({A:A,A:A},"Where Col1 is not null")),data,filter(range,bycol(range,lambda(Σ,counta(Σ)))<>0),
     count,bycol(data,lambda(Σ,counta(Σ))), column,sequence(1,columns(data),1),
     first,tocol(map(tocol(choosecols(data,1),1),lambda(Σ,wraprows(Σ,product(ifna(filter(count,column>1),1)),Σ)))),
     combo_,if(max(column)=1,first,reduce(first,sequence(1,max(column)-1,2,1),lambda(a,c,{a,
           tocol(map(tocol(map(tocol(choosecols(data,c),1),lambda(Σ,wraprows(Σ,product(ifna(filter(count,column>c),1)),Σ)))),lambda(Σ,wraprows(Σ,product(filter(count,column<c)),Σ))),,1)}))),
     Λ,byrow(combo_,lambda(Σ,if(index(Σ,,1)>=index(Σ,,2),,Σ))),
       filter(Λ,index(Λ,,1)<>""))

Impressive. With larger datasets, your `reduce()` appears to have slightly better performance than the iterative formula in my answer. — doubleunary, May 13 '23 at 15:37

score 1 · Answer 7 · answered May 16 '23 at 15:56

1

Ok, here's another one from me. No recursion this time, quite fast. Should be made a named function with a single parameter range:

=LET(
    ref,
    BYCOL(range, LAMBDA(c, IFNA(FILTER(c, c <> "")))),
    nums,
    BYCOL(ref, LAMBDA(c, MAX(1, COUNTA(c)))),
    total,
    PRODUCT(nums),
    divs,
    SCAN(total, nums, LAMBDA(acc, cur, acc / cur)),
    ARRAYFORMULA(
        VLOOKUP(
            MOD(INT(SEQUENCE(total, 1,) / divs), nums),
            {
                SEQUENCE(ROWS(ref), 1,),
                ref
            },
            SEQUENCE(1, COLUMNS(ref), 2),)
    )
)

answered May 16 '23 at 15:56

kishkin

5,152
1
26
40

1

Thanks for sharing that kishkin, it's simple and elegant. For some reason, the performance is not as good as I expected. With a larger dataset (36k result rows), it is about `2x` faster than a minified recursive formula, but rockinfreakshow's `reduce()` clocks in at about `6x`. A binary search `vlookup()` should be fast so this is a surprise. See [the benchmark](https://docs.google.com/spreadsheets/d/1hzuAqileCdrB1pVKAiAbWmstBL63FoXot4QBpmwWAjA/edit#gid=1908738223). – doubleunary May 16 '23 at 17:55
too many vlookups, i guess :) thanks for the benchmarks – kishkin May 17 '23 at 06:59

score 0 · Answer 8 · answered Apr 28 '23 at 06:46

The question specifies cross join and n-ary Cartesian product. These concepts are different from string concatenation ("car" + "red" → "car red").

Most of the existing answers use text string manipulation and split() which is undesirable because they may cause side effects such as converting the text string 1 2 3 to the date 2 January 2003. Some answers use recursion through a named function that calls itself. That works, but it is often undesirable as well, because named functions need to be recreated or imported in each spreadsheet where they are to be used.

One way to implement n-ary Cartesian product in a plain vanilla Google Sheet formula without those undesirable traits is to use a recursive lambda function. The formula below takes a range of any number of columns and gives all ordered n-tuples of their non-blank values, column-wise:

=let( 
  table, A2:D, 
  blank, iferror(1/0), 
  first_, lambda(array, tocol(choosecols(array, 1), true)), 
  rest_, lambda(n, choosecols(table, sequence(1, columns(table) - n, n + 1))), 
  wrap_, lambda(array, wrapCount, wraprows(tocol(array, true), wrapCount)), 

  cartesian_, lambda(a, b, wrap_( 
    byrow(a, lambda(row, 
      reduce(blank, sequence(rows(b)), lambda(acc, i, 
        { acc, row, chooserows(b, i) } 
      ) ) 
    ) ), 
    columns(a) + columns(b) 
  ) ), 

  iterate_, lambda( 
    self, a, b, if(iserror(b), a, 
      self(self, cartesian_(a, first_(b)), rest_(columns(a) + 1)) 
    ) 
  ), 

  iterate_(iterate_, first_(table), rest_(1)) 
)

The same can also be done in an iterative fashion. The formula below will perform about two times faster than split() based solutions with larger datasets:

=let( 
  table, A2:D, 
  numCols, columns(table), 
  colIndices, sequence(1, numCols), 
  column_, lambda(colIndex, tocol(choosecols(table, colIndex), 1)), 
  numColRows, bycol(colIndices, lambda(i, rows(column_(i)))), 
  numColRows_, lambda(i, index(numColRows, 0, i)), 
  colIndicesToRight_, lambda(i, sequence(1, numCols - i, i + 1)), 
  numColRowsToRight_, lambda(i, bycol(colIndicesToRight_(i), numColRows_)), 
  numCombos_, lambda(i, iferror(product(numColRowsToRight_(i)), 1)), 
  repeatCells_, lambda(colIndex, tocol( 
    map(column_(colIndex), lambda(cell, 
      bycol(sequence(1, numCombos_(colIndex)), lambda(_, cell)) 
    ) )  
  ) ), 
  repeatGroup_, lambda(g, n, tocol(bycol(sequence(1, n), lambda(_, g)), 0, 1)), 
  fillColumn_, lambda(colIndex, let( 
    group, repeatCells_(colIndex), 
    repeatGroup_(group, product(numColRows) / rows(group))
  ) ), 
  if( 
    min(numColRows) = 0, "All columns must have at least one item.", 
    if( 
      numCols * product(numColRows) > 10^6, "Too many results.", 
      bycol(colIndices, fillColumn_) 
    ) 
  ) 
)

The identifiers in the above formulas use a naming convention where a lambda function_ name definition has a trailing underscore.

Both formulas will work with any number of columns and give results like these:

source data
a	1	X
b	2	Y
c

× Cartesian
a	1	X
a	1	Y
a	2	X
a	2	Y
b	1	X
b	1	Y
b	2	X
b	2	Y
c	1	X
c	1	Y
c	2	X
c	2	Y

Generate all possible combinations for Columns(cross join or Cartesian product)

8 Answers8

Linked

Related