36

I have a Google SpreadSheets doc with three columns A, B and C.

I need to populate the Column C with all the possible combinations of the values in Columns A and B. Please take a look a the capture to see what I mean.

I found this to be done in Excel, here, but it doesn't work in google spreadsheets.

The formula should be useful even for more columns (e.g.: four instead of two)

Can I do this?

enter image description here

TheMaster
  • 45,448
  • 6
  • 62
  • 85
JPashs
  • 13,044
  • 10
  • 42
  • 65

8 Answers8

44

in post-pandemic new world we can solve this with:

=INDEX(FLATTEN(A2:A3&" "&TRANSPOSE(B2:B4)))

enter image description here

to account for future expansion we can do:

=INDEX(FLATTEN(FILTER(A2:A; A2:A<>"")&" "&TRANSPOSE(FILTER(B2:B; B2:B<>""))))

enter image description here

for 3 columns:

=INDEX(FLATTEN(FLATTEN(
 FILTER(A2:A; A2:A<>"")&" "&TRANSPOSE(
 FILTER(B2:B; B2:B<>"")))&" "&TRANSPOSE(
 FILTER(C2:C; C2:C<>""))))

enter image description here

4 columns:

=INDEX(FLATTEN(FLATTEN(FLATTEN(
 FILTER(A2:A; A2:A<>"")&" "&TRANSPOSE(
 FILTER(B2:B; B2:B<>"")))&" "&TRANSPOSE(
 FILTER(C2:C; C2:C<>"")))&" "&TRANSPOSE(
 FILTER(D2:D; D2:D<>""))))

enter image description here



for more see: https://stackoverflow.com/a/74160711/5632629



player0
  • 124,011
  • 12
  • 67
  • 124
  • 2
    Awesome! And to print them in separate columns instead of one follow the example here: =ArrayFormula(split(flatten(transpose(filter(B2:B; B2:B <> "")) & "," & (flatten(filter(C2:C; C2:C <> "") & "," & transpose(filter(D2:D; D2:D <> ""))))); ",")) https://support.google.com/docs/thread/61770165/create-automatically-all-combinations-of-multiple-columns-each-one-of-different-size?hl=en – NicolasZ Feb 24 '22 at 16:14
25

Update 201810

Original formula crashes for a big dataset. I described a way to make cross-join with any size of data here.


Try formula:

=ArrayFormula(transpose(split(rept(concatenate(A2:A&char(9)),counta(B2:B)),char(9)))
 &" "&transpose(split(concatenate(rept(B2:B&char(9),counta(A2:A))),char(9))))

The result:

car red
train red
car yellow
train yellow
car blue
train blue

You may use it again to add another list:

enter image description here

The formula is in cells C2 and E2,

C2 is:

=ArrayFormula(transpose(split(rept(concatenate(A2:A&char(9)),counta(B2:B)),char(9)))&" "&transpose(split(concatenate(rept(B2:B&char(9),counta(A2:A))),char(9))) )

and E2 is:

=ArrayFormula(transpose(split(rept(concatenate(C2:C&char(9)),counta(D2:D)),char(9)))&" "&transpose(split(concatenate(rept(D2:D&char(9),counta(C2:C))),char(9))) )
player0
  • 124,011
  • 12
  • 67
  • 124
Max Makhrov
  • 17,309
  • 5
  • 55
  • 81
  • Should I insert this formula in a single cell? '=ArrayFormula(transpose(split(rept(concatenate(A2:A&char(9)),counta(B2:B)),char(9)))&""&transpose(split(concatenate(rept(B2:B&char(9),counta(A2:A))),char(9))))' This is what I did but I get an error message. Why did you split the formula in two parts? – JPashs Mar 15 '17 at 09:56
  • Sorry but I can't get it: here the spreadsheet (you can edit), can you paste your formula there: https://docs.google.com/spreadsheets/d/1Z80MNKML4m7YKXn2SfHVZwO0mPs889MX0DXfk6iKx5k/edit?usp=sharing – JPashs Mar 15 '17 at 10:01
  • 2
    You have to replace commas by semicolon `;` in formulas. It is regional settings. I changed file settings region to US and it worked. – Max Makhrov Mar 15 '17 at 10:25
4

Now that we have LAMBDA and LHFs, we can solve this problem nicely using REDUCE.

2 columns:

=ARRAYFORMULA(SPLIT(REDUCE(,SUBSTITUTE(SUBSTITUTE(TRIM(QUERY(
SUBSTITUTE(A2:B," ",CHAR(9)),,9^9))," ",CHAR(127)),CHAR(9)," "),
LAMBDA(a,c,FLATTEN(a&CHAR(127)&SPLIT(c,CHAR(127))))),CHAR(127)))

enter image description here

3 columns:

=ARRAYFORMULA(SPLIT(REDUCE(,SUBSTITUTE(SUBSTITUTE(TRIM(QUERY(
SUBSTITUTE(A2:C," ",CHAR(9)),,9^9))," ",CHAR(127)),CHAR(9)," "),
LAMBDA(a,c,FLATTEN(a&CHAR(127)&SPLIT(c,CHAR(127))))),CHAR(127)))

enter image description here

4 columns:

=ARRAYFORMULA(SPLIT(REDUCE(,SUBSTITUTE(SUBSTITUTE(TRIM(QUERY(
SUBSTITUTE(A2:D," ",CHAR(9)),,9^9))," ",CHAR(127)),CHAR(9)," "),
LAMBDA(a,c,FLATTEN(a&CHAR(127)&SPLIT(c,CHAR(127))))),CHAR(127)))

enter image description here

And so on...

z''
  • 4,527
  • 2
  • 3
  • 12
2

Here's another solution.

A named function CARTESIAN_PRODUCT:

=IF(COLUMNS(range) = 1, IFNA(FILTER(range, range <> "")), LAMBDA(sub_product, last_col, REDUCE(, SEQUENCE(ROWS(sub_product)), LAMBDA(acc, cur, LAMBDA(new_range, IF(cur = 1, new_range, {acc; new_range}))({ARRAYFORMULA(IF(SEQUENCE(ROWS(last_col)), INDEX(sub_product, cur,))), last_col}))))(CARTESIAN_PRODUCT(ARRAY_CONSTRAIN(range, ROWS(range), COLUMNS(range) - 1)), LAMBDA(r, IFNA(FILTER(r, r <> "")))(INDEX(range,, COLUMNS(range)))))

(in a little bit more readable formatting)

The only argument is range which specifies the columns with the values. Empty cells will be ignored.

It uses recursion and does not use query smashing or string splitting. It works with any number of columns.

enter image description here

kishkin
  • 5,152
  • 1
  • 26
  • 40
  • When trying to use that, I get a syntax error. For some reason in my case the first `range` param in `FILTER` isn't recognized as the named argument (and coloured orange), even though the second one is. Weird. If it's not a problem, could you check if the line that you pasted here still works in the current GSHeets for you? – JoannaFalkowska Jul 04 '23 at 11:29
  • @JoannaFalkowska checked, it works without any change. Maybe you have language settings other than US, so parameter delimiter in functions is `;` instead of `,`? – kishkin Jul 19 '23 at 08:41
1

Just to expand on @nicolasZ's comment for those (like myself) not so familiar with sheets syntax:

If you want to create the combinations but split into unique columns you can follow a very similar patter to @player0's answer but wrapping everything in ARRAYFORMULA(SPLIT( ...)," ".

What this does is split the result in a single column by a space (" ") and then distribute this into the next N columns.

Explicitly, to combine 5 columns of data I used:

= ARRAYFORMULA(SPLIT( FLATTEN(FLATTEN(FLATTEN(FLATTEN(
 FILTER(B2:B, B2:B<>"")&" "&TRANSPOSE(
 FILTER(C2:C, C2:C<>"")))&" "&TRANSPOSE(
 FILTER(D2:D, D2:D<>"")))&" "&TRANSPOSE(
 FILTER(E2:E, E2:E<>"")))&" "&TRANSPOSE(
 FILTER(F2:F, F2:F<>"")))," "))

To add more rows you simply have to add another Flatten( to the beginning of the command and then insert &" "&TRANSPOSE( FILTER(F2:F, F2:F<>""))) at the end (but before the " ")).

Note that there must be enough empty column/rows to expand the formula or the command will fail to evaluate.

zephyrus
  • 1,266
  • 1
  • 12
  • 29
1

Here's one more approach to add to the collection:

=let(range,A:D,data,filter(range,bycol(range,lambda(Σ,counta(Σ)))<>0),
     count,bycol(data,lambda(Σ,counta(Σ))), column,sequence(1,columns(data),1),
     first,tocol(map(tocol(choosecols(data,1),1),lambda(Σ,wraprows(Σ,product(ifna(filter(count,column>1),1)),Σ)))),
     if(max(column)=1,first,reduce(first,sequence(1,max(column)-1,2,1),lambda(a,c,{a,
           tocol(map(tocol(map(tocol(choosecols(data,c),1),lambda(Σ,wraprows(Σ,product(ifna(filter(count,column>c),1)),Σ)))),lambda(Σ,wraprows(Σ,product(filter(count,column<c)),Σ))),,1)}))))

enter image description here

No change in formula needed even when the data is in just 3 columns (and non-adjacent too) enter image description here

2 columns with still the same formula enter image description here

An extension to this formula can be used to create unique pair combinations (as described in this question here) while excluding\limiting the duplicates (1,1 | 2,2 | 3,3.... and 1,2 | 2,1 | 1,3 | 3,1....)

=let(range,index(query({A:A,A:A},"Where Col1 is not null")),data,filter(range,bycol(range,lambda(Σ,counta(Σ)))<>0),
     count,bycol(data,lambda(Σ,counta(Σ))), column,sequence(1,columns(data),1),
     first,tocol(map(tocol(choosecols(data,1),1),lambda(Σ,wraprows(Σ,product(ifna(filter(count,column>1),1)),Σ)))),
     combo_,if(max(column)=1,first,reduce(first,sequence(1,max(column)-1,2,1),lambda(a,c,{a,
           tocol(map(tocol(map(tocol(choosecols(data,c),1),lambda(Σ,wraprows(Σ,product(ifna(filter(count,column>c),1)),Σ)))),lambda(Σ,wraprows(Σ,product(filter(count,column<c)),Σ))),,1)}))),
     Λ,byrow(combo_,lambda(Σ,if(index(Σ,,1)>=index(Σ,,2),,Σ))),
       filter(Λ,index(Λ,,1)<>""))

enter image description here

rockinfreakshow
  • 15,077
  • 3
  • 12
  • 19
  • 1
    Impressive. With larger datasets, your `reduce()` appears to have slightly better performance than the iterative formula in my answer. – doubleunary May 13 '23 at 15:37
1

Ok, here's another one from me. No recursion this time, quite fast. Should be made a named function with a single parameter range:

=LET(
    ref,
    BYCOL(range, LAMBDA(c, IFNA(FILTER(c, c <> "")))),
    nums,
    BYCOL(ref, LAMBDA(c, MAX(1, COUNTA(c)))),
    total,
    PRODUCT(nums),
    divs,
    SCAN(total, nums, LAMBDA(acc, cur, acc / cur)),
    ARRAYFORMULA(
        VLOOKUP(
            MOD(INT(SEQUENCE(total, 1,) / divs), nums),
            {
                SEQUENCE(ROWS(ref), 1,),
                ref
            },
            SEQUENCE(1, COLUMNS(ref), 2),)
    )
)

enter image description here

kishkin
  • 5,152
  • 1
  • 26
  • 40
  • 1
    Thanks for sharing that kishkin, it's simple and elegant. For some reason, the performance is not as good as I expected. With a larger dataset (36k result rows), it is about `2x` faster than a minified recursive formula, but rockinfreakshow's `reduce()` clocks in at about `6x`. A binary search `vlookup()` should be fast so this is a surprise. See [the benchmark](https://docs.google.com/spreadsheets/d/1hzuAqileCdrB1pVKAiAbWmstBL63FoXot4QBpmwWAjA/edit#gid=1908738223). – doubleunary May 16 '23 at 17:55
  • too many vlookups, i guess :) thanks for the benchmarks – kishkin May 17 '23 at 06:59
0

The question specifies cross join and n-ary Cartesian product. These concepts are different from string concatenation ("car" + "red" → "car red").

Most of the existing answers use text string manipulation and split() which is undesirable because they may cause side effects such as converting the text string 1 2 3 to the date 2 January 2003. Some answers use recursion through a named function that calls itself. That works, but it is often undesirable as well, because named functions need to be recreated or imported in each spreadsheet where they are to be used.

One way to implement n-ary Cartesian product in a plain vanilla Google Sheet formula without those undesirable traits is to use a recursive lambda function. The formula below takes a range of any number of columns and gives all ordered n-tuples of their non-blank values, column-wise:

=let( 
  table, A2:D, 
  blank, iferror(1/0), 
  first_, lambda(array, tocol(choosecols(array, 1), true)), 
  rest_, lambda(n, choosecols(table, sequence(1, columns(table) - n, n + 1))), 
  wrap_, lambda(array, wrapCount, wraprows(tocol(array, true), wrapCount)), 

  cartesian_, lambda(a, b, wrap_( 
    byrow(a, lambda(row, 
      reduce(blank, sequence(rows(b)), lambda(acc, i, 
        { acc, row, chooserows(b, i) } 
      ) ) 
    ) ), 
    columns(a) + columns(b) 
  ) ), 

  iterate_, lambda( 
    self, a, b, if(iserror(b), a, 
      self(self, cartesian_(a, first_(b)), rest_(columns(a) + 1)) 
    ) 
  ), 

  iterate_(iterate_, first_(table), rest_(1)) 
)

The same can also be done in an iterative fashion. The formula below will perform about two times faster than split() based solutions with larger datasets:

=let( 
  table, A2:D, 
  numCols, columns(table), 
  colIndices, sequence(1, numCols), 
  column_, lambda(colIndex, tocol(choosecols(table, colIndex), 1)), 
  numColRows, bycol(colIndices, lambda(i, rows(column_(i)))), 
  numColRows_, lambda(i, index(numColRows, 0, i)), 
  colIndicesToRight_, lambda(i, sequence(1, numCols - i, i + 1)), 
  numColRowsToRight_, lambda(i, bycol(colIndicesToRight_(i), numColRows_)), 
  numCombos_, lambda(i, iferror(product(numColRowsToRight_(i)), 1)), 
  repeatCells_, lambda(colIndex, tocol( 
    map(column_(colIndex), lambda(cell, 
      bycol(sequence(1, numCombos_(colIndex)), lambda(_, cell)) 
    ) )  
  ) ), 
  repeatGroup_, lambda(g, n, tocol(bycol(sequence(1, n), lambda(_, g)), 0, 1)), 
  fillColumn_, lambda(colIndex, let( 
    group, repeatCells_(colIndex), 
    repeatGroup_(group, product(numColRows) / rows(group))
  ) ), 
  if( 
    min(numColRows) = 0, "All columns must have at least one item.", 
    if( 
      numCols * product(numColRows) > 10^6, "Too many results.", 
      bycol(colIndices, fillColumn_) 
    ) 
  ) 
)

The identifiers in the above formulas use a naming convention where a lambda function_ name definition has a trailing underscore.

Both formulas will work with any number of columns and give results like these:

source data
a 1 X
b 2 Y
c
× Cartesian
a 1 X
a 1 Y
a 2 X
a 2 Y
b 1 X
b 1 Y
b 2 X
b 2 Y
c 1 X
c 1 Y
c 2 X
c 2 Y
doubleunary
  • 13,842
  • 3
  • 18
  • 51