13

This is a frequently asked and answered question: how can I generate all permutations in Excel:

2011 2016 2017 2017 superuser 2018 2021

and now in 2022 which did not get an answer before it was closed as a duplicate, which is unfortunate because LAMBDA really changes the way that this question would be answered.

I have infrequently had the same need and get frustrated by having to reinvent a complex wheel. So, I will re-pose the question and put my own answer below. I will not mark any submissions as the answer, but invite good ideas. I am sure that my own approach can be improved upon.

Restating the 2022 question

I am trying to create a loop in excel with formulas only. What I am trying to achieve is described below. Let's say I have 3 columns as inputs: (i) Country; (ii) variable; and (iii) year. I want to expand from these inputs to then assign values to these parameters.

Inputs:

Country Variable Year
GB GDP 2015
DE area 2016
CH area 2015

Outputs:

Country Variable Year
GB GDP 2015
GB GDP 2016
GB area 2015
GB area 2016
DE GDP 2015
DE GDP 2016
DE area 2015
DE area 2016

How can I do that efficiently using Excel?

Expanding the 2018 question

I have three columns, each of which has different kinds of master data as shown below:

enter image description here

Now, I want to have all possible combinations of these three cells - like

aa kk jj
aa kk ff
aa ll jj
aa ll ff
aa mm jj
...

Can this be done with a formula. I found one formula with 2 columns, but i'm not able to extend it to 3 columns correctly

Formula with 2 columns:

=IF(ROW()-ROW($G$1)+1>COUNTA($A$2:$A$15)*COUNTA($B$2:$B$4),"",
INDEX($A$2:$A$15,INT((ROW()-ROW($G$1))/COUNTA($B$2:$B$4)+1))&
INDEX($B$2:$B$4,MOD(ROW()-ROW($G$1),COUNTA($B$2:$B$4))+1))

where G1 is the cell to place the resulting value

Common Requirements

What these have in common is that they are both trying to create an ordered set of permutations from an ordered set of symbols. They both happen to need 3 levels of symbols, but the 2018 question was asking for help to go from 2 levels to 3 and the 2021 question was asking to go from 3 levels to 5. The 2022 question is just asking for 3 levels, but the output needs to be a table.

What if we go up to 6 levels like this?

L1 L2 L3 L4 L5 L6
A F K P U 1
B G L Q V 2
C H R W 3
D X 4
E

This would generate 1'440 permutations.

L1 L2 L3 L4 L5 L6
A F K P U 1
A F K P U 2
A F K P U 3
A F K P U 4
A F K P V 1
A F K P V 2
A F K P V 3
A F K P V 4
A F K P W 1
... ... ... ... ... ...

Making a general formula that takes in any number of levels (columns) is hard. Just go through the answers provided - they each required some rocket science and until now, all the solutions had hard-coded limits on the number of columns of symbols. So can LAMBDA give us a general solution?

mark fitzpatrick
  • 3,162
  • 2
  • 11
  • 23
  • 3
    incredible... i got a blue screen just as I was posting an answer - still recovering... – mark fitzpatrick Feb 19 '22 at 20:41
  • By the way, for those who want to convert the Table output into a table of concatenated strings, you can wrap the chosen formula from the answers below in `=BYROW( yourChosenPermutationFormula, LAMBDA(x,TEXTJOIN("-",1,x)))` and it will generate a table of strings as requested in a few related questions such as the [2021](https://stackoverflow.com/questions/67905393/how-to-generate-all-combinations-using-5-columns-using-a-formula-in-excel/67926177#67926177) question above. – mark fitzpatrick Feb 20 '22 at 08:33

5 Answers5

8

Cool question and brain teezer; I just puzzled on something which is using MAKEARRAY():


Option 1:

What you'd call as "super inefficient" is creating a list of permutations when you calculate rows^columns. I think the following is not that inefficient. Let's imagine the following:

enter image description here

Formula in E1:

=LET(A,A1:C3,B,ROWS(A),C,COLUMNS(A),D,B^C,E,UNIQUE(MAKEARRAY(D,C,LAMBDA(rw,cl,INDEX(IF(A="","",A),MOD(CEILING(rw/(D/(B^cl)),1)-1,B)+1,cl)))),FILTER(E,MMULT(--(E<>""),SEQUENCE(C,,,0))=C))

In short, what this does:

  • Variables A-D are all helpers.
  • The idea then is to just use simple INDEX()s to return all values. To do so we need the right indices for both rows and columns.
  • MAKEARRAY() will make calculations relatively easy due to the recursive functionality that lambda brings. Inside these functions its basic math to return the correct indices for both these rows and columns. In fact there is no calculation needed for the columns as we simply refer to 'cl' and all the calculation for all the row indices is done through MOD(CEILING(rw/(D/(B^cl)),1)-1,B)+1.
  • The result from the above is put into UNIQUE() to use very little resources to filter out any potential duplicates and limit potential empty rows to just a single empty row.
  • FILTER() and MMULT() work nicely together to filter out any unwanted results (read; empty).

That's as compact and quick as I think I could get this. The formula will now work on any contiguous range of cells. A single cell, a single row, a single column or any 2D-range.


Option 2:

OP rightfully mentioned that option 1 could create too many tuples at the start to only later discard them. This can be inefficient. To tackle this issue (and if this is not something you want), we can use a much larger formula. Let's imagine the following data:

A B C
a d f
b e h
e
c g
g

We see that there are empty cells and duplicate values. These are the reasons option 1 would create too many tuples. To counter that I came up with a much longer formula:

=LET(A,A1:C5,B,ROWS(A),C,COLUMNS(A),D,IF(A="",NA(),A),E,MAKEARRAY(B,C,LAMBDA(rw,cl,INDEX(SORT(INDEX(D,0,cl)),rw))),F,BYCOL(E,LAMBDA(cl,COUNTA(FILTER(cl,NOT(ISERROR(cl)))))),G,MAKEARRAY(PRODUCT(F),C,LAMBDA(rw,cl,INDEX(E,MOD(CEILING(rw/IFERROR(PRODUCT(INDEX(F,SEQUENCE(C-cl,,cl+1))),1),1)-1,INDEX(F,cl))+1,cl))),UNIQUE(G))

To break this down:

  • LET() - Work with variables;
  • A - Our initial full range of cells (contiguous);
  • B - The total amount of rows of A;
  • C - The total amount of columns of A;
  • D - The formula IF(A="",NA(),A) is meant to check each value in the matrix if it's empty (string). If so, make it an error (which will make sense in the next step).
  • E - In this step the formula MAKEARRAY(B,C,LAMBDA(rw,cl,INDEX(SORT(INDEX(D,0,cl)),rw))) is sorting each column so the values are on top and all errors pushed down:
A B C
a d f
b e g
c e g
#N/A #N/A h
#N/A #N/A #N/A
  • F - This variable's formula BYCOL(E,LAMBDA(cl,COUNTA(FILTER(cl,NOT(ISERROR(cl)))))) will now count the amount of items per columns. This is needed for later use and to count all permutations. The outcome in this specific case would be {3;3;4}.
  • G - The last variable (if one chooses to use it as such) using MAKEARRAY(PRODUCT(F),C,LAMBDA(rw,cl,INDEX(E,MOD(CEILING(rw/IFERROR(PRODUCT(INDEX(F,SEQUENCE(C-cl,,cl+1))),1),1)-1,INDEX(F,cl))+1,cl))). It's rather long but each step makes sense; Get the product (all possible permutations) to calculate the total amount of rows, the columns stay the same. In the LAMBDA() we refer to all columns after the current column-indice in the F variable. It's quite a big chunk to digest and unfortunately I'm not good enough in explaining =).
  • UNIQUE(G) - The very last step is to filter out all double permutations if one chooses too.

The result:

enter image description here

Now, even though option 1 would beat option 2 in readability, the 2nd option took (with very limited testing) just a third of the time it took the 1st option to calculate. So in terms of speed, this 2nd option would be preferred.


As an alternative to the 2nd option I first had:

=LET(A,A1:C5,B,ROWS(A),C,COLUMNS(A),D,MAKEARRAY(B,C,LAMBDA(rw,cl,IF(MATCH(INDEX(A,rw,cl),INDEX(A,0,cl),0)=rw,INDEX(A,rw,cl),NA()))),E,MAKEARRAY(B,C,LAMBDA(rw,cl,INDEX(SORT(INDEX(D,0,cl)),rw))),F,BYCOL(E,LAMBDA(cl,COUNTA(UNIQUE(FILTER(cl,NOT(ISERROR(cl))))))),G,MAKEARRAY(PRODUCT(F),C,LAMBDA(rw,cl,INDEX(E,MOD(CEILING(rw/IFERROR(PRODUCT(INDEX(F,SEQUENCE(C-cl,,cl+1))),1),1)-1,INDEX(F,cl))+1,cl))),G)

This would now change the D variable to a longer formula to remove duplicates in each column beforehand. Both variations would work just fine.

JvdV
  • 70,606
  • 8
  • 39
  • 70
  • 2
    This is genius! `MOD(CEILING(rw/(D/(B^cl)),1)-1,B)+1` is a work of art. Your version integrates the ***stranded f*** instead of forcing an error. However, it is not efficient because it still generates a permutation for the whole matrix, so the 100X3 matrix will still generate 1m tuples. But there is a big gain in functionality by avoiding the strande f problem. One thought occurs to me: if you wrapped your final `FILTER` in `UNIQUE`, you would prevent duplications in the case where the user had the same symbol multiple times in a column of symbols. Brilliant work. – mark fitzpatrick Feb 20 '22 at 07:54
  • 1
    @markfitzpatrick. Thanks for the kind words and the edit. I can tell you tested it on other ranges since the first variable is now out of tune. It would work for any range I think, from a single cell, a single row, a single column to a 2d-array. Also, indeed, it would be a simple wrap in `UNIQUE()`. Good idea. – JvdV Feb 20 '22 at 07:56
  • 1
    oops - sorry - you're right. I forgot to bring it back to `A1:C3`. I was holding my breath that I did not damage it in the reformatting. On the 1m tuples, it seems it does generate them. If you put in A1:C100 as the input matrix (A), the variable E generated 100^3 tuples. btw - I not only tested it, I loaded it in my library with comments crediting you and I will be studying it for a while now - the MMULT is also a work of art. – mark fitzpatrick Feb 20 '22 at 08:01
  • Unfortunately, filtering the rows won't solve the problem in practice. A typical scenario is that at least one column is fully populated and the others are mostly empty. Imagine that column A has Chapters1:3, B has Sections 1:8 and C has Pgh's 1:100, we still end up with 1m tuples. Perhaps there is a way to forecast the tuples with `PRODUCT(countOfSymbolsByCol)` and then generate the indices using that method you taught me: `SCAN(1,countOfSymbolsByCol,LAMBDA(a,b,a*b))`. It is effectively what I did, but not nearly as cleanly as you did. This might bring back the *stranded f* problem though. – mark fitzpatrick Feb 20 '22 at 08:25
  • 1
    @markfitzpatrick, I added your idea of using `UNIQUE()`. Effectively making the question to find all *unique* permutations. It's not used at the final array since `MMULT()` is the only function in here that I feel could slow things down. Hence why it's done on the result of the `MAKEARRAY()`. That gives `FILTER()` and `MMULT()` to only do as much as is maximum required. I hope this update helped. I tried to explain the logic in my math, but it's hard to catch in words =) – JvdV Feb 20 '22 at 09:08
  • Hey - I just time tested it with a 100x3 matrix with symbols of 100, 3 and 2 (or 600 valid tuples) and the results surprised me. Your idea of wrapping the `MAKEARRAY` in `UNIQUE` makes sense, but it took 3:15 whereas wrapping the `FILTER` in `UNIQUE` came in at 2:30. I don't really have a proper test environment, so it could be dumb luck. – mark fitzpatrick Feb 20 '22 at 09:43
  • 1
    @markfitzpatrick, it would definitely suprise me. Using `UNIQUE()` where I did makes more common sense since it would limit the rows that need processed in the `MMULT()` whereas putting it after the filtering, it would have to process more rows at first and still has to do the same amount of processing in `UNIQUE()`. Strange results, but hey, excel sometimes works in mysterious ways. – JvdV Feb 20 '22 at 11:35
  • I'm going to run more tests just to see. As you said, common sense is that it would be faster to do UNIQUE first or, at least, that it will make no difference. To test, I need to shut down everything else on my PC. I may go with a smaller array too - 3 mins per run is time consuming to get a good sample size. – mark fitzpatrick Feb 20 '22 at 11:39
  • 1
    Hey - so I ran lots of tests with a clean machine, disconnected from the Internet. With a 70 X 3 matrix, nearly all of the runs came in at 35 seconds for either placement of UNIQUE. One of them jumped to 4 minutes, so this shows that the issue I saw before was purely a random situation. – mark fitzpatrick Feb 21 '22 at 07:03
  • @markfitzpatrick, Thank you for all the tests. That must have been time-consuming. I've also kept working on the issue of creating too many tuples. It's possible to avoid this but stuff gets lenghty. I edited the answer and tried to explain step-by-step what the thought-process was. In the end it would be interesting to see if we gained any speed compared to the 1st option. – JvdV Feb 21 '22 at 11:37
6

A Simple LET supported by LAMBDA

To the 2022 question, I used the following LET:

=LET( matrix, A2:E6,

   cC, COLUMNS( matrix ), cSeq, SEQUENCE( 1, cC ),
   rC, ROWS( matrix ), rSeq, SEQUENCE( rC ),
   eC, rC ^ cC, eSeq, SEQUENCE( eC,,0 ),
   unblank, IF( ISBLANK(matrix), "°|°", matrix ),
   m, UNIQUE( INDEX( unblank, MOD( INT( INT( SEQUENCE( eC, cC, 0 )/cC )/rC^SEQUENCE( 1, cC, cC-1, -1 ) ), rC ) + 1, cSeq ) ),
   FILTER( m, BYROW( IFERROR( FIND( "°|°", m ), 0 ), LAMBDA(x, SUM( x ) ) ) = 0 ) )

where the matrix that is used to generate the permutations in in A2:E6: enter image description here

This formula inserts "°|°" in place of blanks so as to avoid having the blanks recast as 0's. It eliminates duplicates by applying UNIQUE.

This is super inefficient because it generates every possible permutation, including repeats and then filters them out. For a small scale matrix, it's not a big deal, but imagine that a 100 row by 3 column matrix would generate 1'000'000 rows and then filter them down to a small subset.

There is one small benefit, however, notice the f in the yellow cell, stranded in the middle of the column and not contiguous with its peers. This formula handles that like a champ. It just integrates it into the outputs of valid permutations. That will be important in the efficient formula below.

An Efficient General LET supported by LAMBDA

As for a general LAMBDA formula, I used:

SYMBOLPERMUTATIONS =
LAMBDA( matrix,

LET(
       cC, COLUMNS( matrix ),
       cSeq, SEQUENCE( 1, cC ),
       symbolCounts, BYCOL( matrix, LAMBDA(x, SUM( --NOT( ISBLANK( x ) ) ) ) ),
       rSeq, SEQUENCE( MAX( symbolCounts )-1 ),
       permFactors, INDEX( SCAN( 1, INDEX( symbolCounts, , cC-cSeq+1), LAMBDA(a,b, a*b ) ),, cC-cSeq+1 ),
       permMods, IFERROR( INDEX( permFactors,, IF( cSeq + 1  > cC, -1, cSeq+1 ) ), 1 ),
       idx, INT( MOD( SEQUENCE( INDEX(permFactors, 1, 1),,0 ), permFactors )/permMods ) + 1,
       answer, INDEX( matrix, idx, cSeq ),
       er, OR( BYCOL( --ISBLANK(matrix), LAMBDA(x, SUM(--(INDEX(x,rSeq+1)<INDEX(x,rSeq))) ) ) ), // detect if there are stranded symbols
       IF( SUM(symbolCounts)=0, "no symbols",
             IF( er, "symbol columns must be contiguous",
                          answer ) ) )

)

enter image description here

This takes in matrix, just as above (the LET version is shown below). It delivers quite the same result, but there are significant differences:

  • It is efficient. In the example shown here, the Simple formula above would generate 5^6 = 15625 permutations just to arrive at 1440 valid ones. This generates only what is needed and does not require filtering.
  • However, it does not handle that stranded f at all. In fact, it would generate a 0 in the place of the f which is not something that a user with a ton of permutations might even notice.

For this reason, there is error detection and handling. The variable er detects if there are any stranded symbols in the matrix that are not column-wise contiguous with the ones above it using this LAMBDA Helper:

OR( BYCOL( --ISBLANK(matrix), LAMBDA(x, SUM(--(INDEX(x,rSeq+1)<INDEX(x,rSeq))) ) ) )

But this creates a new challenge and perhaps a whole new question. In the case of the stranded f, can anyone come up with a way to incorporate it?

The other error trap is to detect if the column is completely empty.

LAMBDA Magic

The magic that LAMBDA brings is from this one line that makes both formulas extensible to any number of columns without having to hard code them:

permFactors, INDEX( SCAN( 1, INDEX( symbolCounts, , cC-cSeq+1), LAMBDA(a,b, a*b ) ),, cC-cSeq+1 )

I got this from JvdV's answer to this question that was aimed specifically at trying to solve this question. Scott Craner & Bosco Yip had showed that it was basically unsolvable without LAMBDA and JvdV showed how it can be done with the LAMBDA helper SCAN.

LET Version of the Efficient Formula

=LET( matrix, A2:F6,
   cC, COLUMNS( matrix ),
   cSeq, SEQUENCE( 1, cC ),
   symbolCounts, BYCOL( matrix, LAMBDA(x, SUM( --NOT( ISBLANK( x ) ) ) ) ),
   rSeq, SEQUENCE( MAX( symbolCounts )-1 ),
   permFactors, INDEX( SCAN( 1, INDEX( symbolCounts, , cC-cSeq+1), LAMBDA(a,b, a*b ) ),, cC-cSeq+1 ),
   permMods, IFERROR( INDEX( permFactors,, IF( cSeq + 1  > cC, -1, cSeq+1 ) ), 1 ),
   idx, INT( MOD( SEQUENCE( INDEX(permFactors, 1, 1),,0 ), permFactors )/permMods ) + 1,
   answer, INDEX( matrix, idx, cSeq ),
   er, OR( BYCOL( --ISBLANK(matrix), LAMBDA(x, SUM(--(INDEX(x,rSeq+1)<INDEX(x,rSeq))) ) ) ),
   IF( SUM(symbolCounts)=0, "no symbols",
         IF( er, "symbol columns must be contiguous",
                      answer ) ) )
mark fitzpatrick
  • 3,162
  • 2
  • 11
  • 23
  • I wonder if there might be a simpler solution: a Lambda that processes 2 ranges, which can be nested any number of times. Call it `Perms`, use it like `=Perms(r1, r2)`. Nested like `=Perms(Perms(r1, r2), r3)` or as many levels as wanted. The input ranges could be Unique'd, Filter'd and or Sort'd as required – chris neilsen Feb 19 '22 at 21:36
  • @chrisneilsen Scott Craner suggested that a recursive LAMBDA might be the way to go. On the *stranded f* problem, I came so close to solving it, but ran out of insights as I wondered if one of the LAMBDA helpers could tackle that as well. Then there would be one less way that it could break. – mark fitzpatrick Feb 19 '22 at 21:39
  • 1
    I know this is a generic tool, but I had some specific Boolean logic which detects for and removes invalid permutations - so I've added the following to the solution perms, INDEX( matrix, idx, cSeq ), answer, FILTER(perms, NOT((INDEX(perms,,1)=invalidCombo1) * (INDEX(perms,,2)=invalidCombo2)) * NOT((INDEX(perms,,1)=invalidCombo1) * (INDEX(perms,,2)=invalidCombo3))), – blissapp Apr 24 '23 at 07:14
3

Perhaps an iterative LAMBDA:

MyLambda defined in Name Manager as:

    =LAMBDA(n,
    LET(
        Rng, Sheet1!$A$1:$D$3,
        α, COLUMNS(Rng),
        β, INDEX(Rng, , n),
        γ, FILTER(β, β <> ""),
        IF( n > α,
            "",
            LET(δ, γ & REPT(CHAR(32), 25) & TRANSPOSE(MyLambda(n + 1)),
                ε, COLUMNS(δ),
                ζ, SEQUENCE(ROWS(δ) * ε, , 0),
                INDEX(δ, 1 + QUOTIENT(ζ, ε), 1 + MOD(ζ, ε))
               )
          )
        )
    )

after which we can then call

=LET(Rng,Sheet1!$A$1:$D$3,α,COLUMNS(Rng),TRIM(MID(MyLambda(1),25*SEQUENCE(,α,0)+1,25)))

within the worksheet.

Note that each column within Rng must contain at least one entry

The logic behind the LAMBDA is to iteratively (via the call to MyLambda(n+1) within MyLambda) concatenate successive columns within the range, based on the fact that the concatenation of two orthogonal arrays generates a two-dimensional array comprising all permutations of elements of those two arrays.

={"Dave";"Mike"}&TRANSPOSE({"Bob";"Steve";"Paul"})

for example, generates

{"DaveBob","DaveSteve","DavePaul";"MikeBob","MikeSteve","MikePaul"}

which then needs to be redimensioned to a one-dimensional horizontal array, i.e.

{"DaveBob";"DaveSteve";"DavePaul";"MikeBob";"MikeSteve";"MikePaul"}

such that it can be iteratively concatenated with the next (vertical, orthogonal) column within the range. And so on.

We are effectively:

unpivoting:
  {"Bob","Steve","Paul";"Bob","Steve","Paul";"Bob","Steve","Paul"}
by:
  {"Dave";"Mike"}

So the input array n is:

A B
Dave Bob
Mike Steve
Paul

Column B gets transposed and replicated:

A B C D
Dave Bob Steve Paul
Mike Bob Steve Paul

and now we UNPIVOT (a kind of flatten) Columns B, C & D by Column A:

A B
Dave Bob
Dave Steve
Dave Paul
Mike Bob
Mike Steve
Mike Paul

NOTE: This approach:

  • efficiently generates all permutations without repetition
  • ignores blanks and does not require symbols to be on adjacent rows

The second function, called within the worksheet, then parses the concatenated array into the required number of rows/columns.

mark fitzpatrick
  • 3,162
  • 2
  • 11
  • 23
Jos Woolley
  • 8,564
  • 2
  • 4
  • 9
  • Hey Jos - I am going to have to try this tonight. It is an interesting idea to make a recursive approach. Your two rules match the rules of the "efficient" approaches. Normally, I would want to avoid a concatenated output as the concatenation can be applied to a table output whereas it is hard to go the other way around. – mark fitzpatrick Feb 21 '22 at 08:52
  • 1
    Hi Mark. The output itself (which is called by the second function) is not concatenated, merely the `LAMBDA`. – Jos Woolley Feb 21 '22 at 09:12
  • So, I got to take it for a spin a few times. It is such a creative approach. Do you think there is a way to pass the target range as an argument without going into the code? *(btw - I though MyLambda was boring, so I named it `WOOLLEFY` on my machine.)* – mark fitzpatrick Feb 21 '22 at 20:42
  • Sure, you could amend the second function so as to incorporate another `LAMBDA`: `=LAMBDA(Rng,TRIM(MID(MyLambda(1),25*SEQUENCE(,COLUMNS(Rng),0)+1,25)))(A1:D3)`. Love the rename, by the way! :-) – Jos Woolley Feb 22 '22 at 05:34
  • Hey Jos - now it all makes sense and I had a chance to drill into the details. I have a mod to propose that is too complex to put into comments or conversation. Would it be OK with you if I edit your post with an Addendum section to show it to you? You can take it over from there and edit it as you see fit. I am loaded down with work at the moment, so it will probably be the weekend before I can touch it next. – mark fitzpatrick Feb 23 '22 at 12:24
  • 1
    Hi Mark. Sure, no rush, add it whenever you get a chance. Will be interested to see what improvements you've come up with! Cheers. – Jos Woolley Feb 23 '22 at 12:35
  • Hey Jos - just a quick follow-up. I managed last weekend to get the LAMBDA function to work, however, it fails on certain inputs. One of the fails is easy to prevent, but will require a different design to trap and prevent the error while remaining efficient. However, I ran into two more error conditions that are simply strange. Recursion makes debugging so tedious. So... I am still a while away from being able to send you a solution to review. Just wanted to send you an update. – mark fitzpatrick Mar 02 '22 at 09:19
  • 1
    No worries, Mark. Thanks for the update. Busy at work myself this week, so catch up with you at some point in the not too distant. Agreed re debugging recursive set-ups - nightmare! – Jos Woolley Mar 02 '22 at 10:02
  • Hey Jos: I finally was able to get it to work. I just put in my proposed edits to your post to show the particular insight that you are bringing. Your approach makes the LAMBDA recursive by recognizing the repeating algorithm. This allows for efficient processing of all permutations without repetition. I also formatted your LAMBDA for readability. I will post a new solution based on your solution that will reference your solution. It was a real pain to create it not only because of debugging recursion, but because I had to learn some rules about recursion in LAMBDA, like no internal recursion. – mark fitzpatrick Mar 13 '22 at 10:46
3

There are problems with my previous answer(s):

  • The Simple formula ignores blanks and does not require symbols be on adjacent rows in the same column, but is inefficient. If given an array with 100 rows in column A, and 3 in column B and 2 in column C, the result should be 100 x 3 x 2 = 600, but it will instead generate 100 x 100 x 100 = 1'000'000 and then filter them. That takes more than 2 minutes to calculate! It also blows up if any column in the input array is completely blank.
  • The Efficient General LET is fast and does not waste computation, but it cannot handle inputs with symbols on non-adjacent rows (the stranded f) and will also blow up if any column is completely blank.

Insights from Jos Woolley's Solution

The solution by Jos Woolley offers two insights that allow for an efficient solution that does not have the problems of either of my previous approaches:

  • the objective is actually to break up each column and successively unpivot it by the next column
  • this idea can be implemented through LAMBDA recursion

Recursive LAMBDA Solution

With these insights, I am able to propose an efficient LAMBDA solution. The following is fully commented so that you can paste it directly into your Advanced Formula Editor:

PERMUTATEARRAY = 
 /* This function recursively generates all unique permutations of an ordered array of symbols.
    It is efficient - i.e., it does not generate duplicate rows that require filtering.
    The arguments are:
        symbolArray - ia a required array input (unless the recursion is done).
            Is an array of ordered symbols where the left most column contains symbols that will be permutated
            by the permutation of the symbols in the next columns to the right. e.g.,
                with symbolArray of:
                    A   1
                    B   2
                A and B will be permutated by 1 and 2:
                    A   1
                    A   2
                    B   1
                    B   2
        byArray - optional array that will be used to permutate the next column of the symbolArray.
            This is passed between recursions and it not intended for use by the user but it can be used.
        cleaned - optional argument to indicate that the symbolArray has already been cleaned.
            This prevents the function from repeatedly cleaning the symbolArray that would otherwise require
            a repetition for each column of the symbolArray. It is passed between recursions and it not
            intended for use by the user but can be used.
    Example - With a symbolArray of:
        A   C   1
        B   D   2
        The output would be the following array:
        A   C   1
        A   C   2
        A   D   1
        A   D   2
        B   C   1
        B   C   2
        B   D   1
        B   D   2
    NOTES:
     -  Blanks will be ignored.
     -  errors will be ignored. (see comments below to change this)
     -  all rows of the resulting array will be unique
     -  blank columns in the symbol array are removed
     -  this function has no dependencies on external LAMBDA functions even though that would make it more
        readable.
 ------------------------------------  */

LAMBDA( symbolArray, [byArray], [cleaned],
    IF( AND(ISOMITTED(symbolArray),ISOMITTED(byArray)), ERROR.TYPE(7), // DONE
        IF(ISOMITTED(symbolArray), byArray, // if there is no symbolArray, the function is DONE.
            LET(                
                clnSymArray, IF(ISOMITTED(cleaned), //If the symbol array has not been cleaned, then clean it.
                /* Only clean arrays can be permuated. They cannot contain blanks because those are interpreted as 0's.
                   The input also cannot contain entirely blank columns, so these are removed.
                   The input cells also cannot contain errors as this will cause the whole function to error. That,
                   however, is a design choice. They are filtered out inside of this function because the user cannot
                   easily filter them out before passing them as arguments - IFERROR(x,"") causes all blanks to become 0's.
                */
                            LET(
                                COMPRESSC, LAMBDA( a, 
                                        FILTER(a,BYCOL(a,LAMBDA(a,SUM(--(a<>""))>0)))
                                        ),
                                REPLBLANKS, LAMBDA(array, [with], 
                                        LET(w, IF(ISOMITTED(with), "", with),
                                            IF(array = "", w, array)
                                            )
                                        ),
                                REPLERRORS, LAMBDA(array, [with], 
                                        LET(w, IF(ISOMITTED(with), "", with),
                                            IFERROR(array, w)
                                            )
                                        ),
                                COMPRESSC( REPLERRORS( REPLBLANKS(symbolArray) ) )
                                // COMPRESSC( REPLBLANKS(symbolArray) )  //removes the REPLERRORS if the user wants errors to result in erroring the function.
                                    ),
                            symbolArray ), //otherwise, pass the symbolArray
                
                //Once cleaned, effectively execute PERMUTATEARRAY( clnSymArray, byArray, 1 )
                IF( AND(COLUMNS( clnSymArray ) = 1, ISOMITTED( byArray ) ),
                    UNIQUE(FILTER(clnSymArray,clnSymArray<>"")), /* if the user gives a single column, give it back clean even if it was already cleaned.
                                                                    there is no point in testing again whether Clean has been set. DONE */
                    /* Otherwise, we can recursively process the inputs in the following LET. */
                    LET(
                        // MUX is an internal LAMBDA function that permutates the left most column of the p array by the b (by) array.
                            MUX, LAMBDA( p, b, 
                                    LET(pR, ROWS( p ),
                                        byR, ROWS( b ),
                                        byC, COLUMNS( b ),
                                        byCseq, SEQUENCE(,byC+1), // forces this to look at only one column of p
                                        oRSeq, SEQUENCE( byR * pR,,0 ),
                                        IFERROR( INDEX( b, oRSeq/pR+1, byCseq),
                                                INDEX( p, MOD(oRSeq,pR )+1, byCseq-byC ) )
                                        )
                                    ),
                            pRSeq, SEQUENCE(ROWS(clnSymArray)),
                        // Decide when to apply MUX versus when to recurse. MUX is always the final output.
                            // if there are only two symbol columns with no byArray, filter & MUX the two columns - DONE
                            IF( AND(COLUMNS( clnSymArray ) = 2, ISOMITTED( ByArray ) ),
                                LET(pFin, INDEX( clnSymArray, pRSeq, 2),
                                    fpFin, UNIQUE(FILTER(pFin,pFin<>"")),
                                    bFin, INDEX( clnSymArray, pRSeq, 1),
                                    fbFin, UNIQUE(FILTER(bFin,bFin<>"")),
                                    MUX( fpFin, fbFin )
                                    ),
                                // if there are more than two symbol columns with no byArray, repartition the symbol and byArray and recurse 
                                IF( AND(COLUMNS( clnSymArray ) > 2, ISOMITTED( ByArray ) ), 
                                    LET(pC, COLUMNS(clnSymArray),
                                        pCSeq, SEQUENCE(,pC-2,3),
                                        pNext, INDEX( clnSymArray, pRSeq, pCSeq ),
                                        pFin, INDEX( clnSymArray, pRSeq, 2),
                                        fpFin, UNIQUE(FILTER(pFin,pFin<>"")),
                                        bFin, INDEX( clnSymArray, pRSeq, 1),
                                        fbFin, UNIQUE(FILTER(bFin,bFin<>"")),
                                        bNext, MUX( fpFin, fbFin ),
                                        PERMUTATEARRAY( pNext, bNext, 1 )
                                        ) ,
                                    // if there is more than one symbol column and a byArray, repartition the symbol and byArray and recurse
                                    IF( AND(COLUMNS( clnSymArray ) > 1, NOT( ISOMITTED( ByArray ) ) ),
                                        LET(pC, COLUMNS(clnSymArray),
                                            pCSeq, SEQUENCE(,pC-1,2),
                                            pNext, INDEX( clnSymArray, pRSeq, pCSeq ),
                                            pFin, INDEX( clnSymArray, pRSeq, 1),
                                            fpFin, UNIQUE(FILTER(pFin,pFin<>"")),
                                            bNext, MUX( fpFin, ByArray ),
                                            PERMUTATEARRAY( pNext, bNext, 1 )
                                                ),
                                        // if there is only one symbol column and a byArray, filter symbol column & MUX it with he byArray - DONE
                                        IF( AND(COLUMNS( clnSymArray ) = 1, NOT( ISOMITTED( ByArray ) ) ),
                                            LET(pFin, INDEX( clnSymArray, pRSeq, 1),
                                                fpFin, UNIQUE(FILTER(pFin,pFin<>"")),
                                                MUX( fpFin, ByArray )
                                                )
                                )
                            ) ) )
                        ) ) )
            )
        )

);

Even if you remove the comments, there is a lot of code in here, but it is designed that way in order to maximize speed while preventing errors. It is also a fully contained LAMBDA, meaning that it does not need any other LAMBDA functions to be loaded in order to work.

It is able to handle symbols in non-adjacent rows from the same column (the stranded f problem), completely blank columns, and errors in the input array. Here is an example that has all of those problems:

enter image description here

LAMBDA Magic --> Turing Complete

In my previous answer, I could see that LAMBDA allows us now to have an extensible solution to this problem thanks to the use of LAMBDA Helpers as JvdV has shown with a running permutation using SCAN(array,LAMBDA(a,b,a*b)).

Now, LAMBDA allows for recursion and looping that are even more powerful for this particular problem. Without Jos's insight, I would not have recognized that there is a repeating recursive pattern. This new solution may be lengthy, but it solves the problem in a more computationally efficient way which would prevent the spreadsheet from becoming unnecessarily laggy.

On the downside, debugging recursion is a huge pain!

mark fitzpatrick
  • 3,162
  • 2
  • 11
  • 23
  • Looks like a great piece of work, Mark! Will digest in more detail when I've got a bit more time. – Jos Woolley Mar 17 '22 at 09:48
  • Cheers @JosWoolley! I have the same problem. A lot of real work and this is pretty much a hobby. I suspect that the formula can be tightened up, but each endeavor is painful because debugging recursion is hard. I did however, do a speed test and it is comparatively fast. It does a 600 element 100 X 3 array instantly and a 1m element, 100 X 3 array in about 3 seconds compared to 35 seconds for Simple. – mark fitzpatrick Mar 17 '22 at 10:30
2

How about this approach:

=LET(a,  A1:E3,
     b,  UNIQUE(TAKE(a,,1)),
     r,  REDUCE(b,  SEQUENCE(1,COLUMNS(a)-1,2),
         LAMBDA(x,  y,
                TOCOL(x & "," & TOROW(UNIQUE(INDEX(a,,y)))))),
DROP(REDUCE("",r,
     LAMBDA(x,  y,
            VSTACK(x,
                   TEXTSPLIT(y,",")))),
     1))

It starts with the (unique) values from the first column an joins each of these values with each unique value from the second column. The result is stored and each of this stored values is joined with each unique value of the next column, and so on.. in the end this is textsplitted to get the comma separated cell values to separate columns again.

EDIT: To cope with blank column(s) in the permutation list and to handle upper and lower characters as uniques, this could be used:

=LET(a,  A1:C5,
     b,  TAKE(a,,1),
     c,  SEQUENCE(ROWS(b)),
     d,  FILTER(b,MMULT((EXACT(TOROW(b),b))*(TOROW(c)<=c),c^0)=1),
     r,  REDUCE(d,  SEQUENCE(1,COLUMNS(a)-1,2),
         LAMBDA(x,y,
                LET(e,  INDEX(a,,y),
                      f,  SEQUENCE(ROWS(e)),
                      g,  FILTER(e,MMULT((EXACT(TOROW(e),e))*(TOROW(f)<=f),f^0)=1),
                IFERROR(TOCOL(x & "," & TOROW(IF(1/(g<>""),g),3)),x)))),
DROP(REDUCE("",r,
     LAMBDA(x,y,
            VSTACK(x,
                   TEXTSPLIT(y,",")))),
     1))

This basically does the same as the previous version, however it first generates an array of values/strings that do not resemble eachother (upper/lower do not resemble in this case) and stacks the pivoted new array to the start/previous result-array, so each pivoted string in the new array is added to each string from the previous result separated by a comma delimiter; the TOCOL makes sure all results get flattened to a single column listing all comma separated permutations.

Finally, since TEXTSPLIT doesn't work correctly with range input, we need another lambda to split the previous result into columns.

We could also use TEXTSPLIT(TEXTJOIN(";",,r),",",";")) instead of reduce, but I think we will easily reach the character limit of textjoin if we have a lot of permutations.

P.b
  • 8,293
  • 2
  • 10
  • 25
  • 1
    Hey P.b.! I learned a lot from reading through this - TAKE & REDUCE are really useful. However, it doesn't meet the criteria. Let's take this as the input array: `={"A","","B";"1","","";"Y","","Z"}` placed in A1 and using range A1:C3 in your formula. Your solution generates 9 perms of 3 columns. The correct solution would be 6 perms of 2 columns: effectively: `={"A","B";"A","Z";"1","B";"1","Z";"Y","B";"Y","Z"}` I'm not sure how to modify your formula - it needs to eliminate blank columns and it also needs "deblanking" to eliminate all symbols from the array `a` that contain blanks. – mark fitzpatrick Jun 11 '23 at 07:46
  • By the way - in testing I realized that my formula has a flaw. Because it uses UNIQUE, it does not distinguish capital and miniscule letters, i.e., "A" and "a" are seen as the same. I can think of some solutions, but have not the time to work on this. Work and family have been eating all my time for the last 18 months and I have not been working on Excel as a result. :-/ – mark fitzpatrick Jun 11 '23 at 07:49
  • I'm going to look at both blank columns (I didn't anticipate on complete blank columns and on the upper lower distinction. – P.b Jun 12 '23 at 06:54
  • `=LET(a, A1:C3, b, UNIQUE(TAKE(a,,1)), r, REDUCE(b, SEQUENCE(1,COLUMNS(a)-1,2), LAMBDA(x,y, IFERROR(TOCOL(x & "," & UNIQUE(TOROW(IF(1/(INDEX(a,,y)<>""),INDEX(a,,y)),3))),x))), DROP(REDUCE("",r, LAMBDA(x,y, VSTACK(x, TEXTSPLIT(y,",")))), 1)) ` this works in case of truly blank cells also on blank value generated by a formula. – P.b Jun 12 '23 at 07:08
  • `TOROW(IF(1/(INDEX(a,,y)<>""),INDEX(a,,y)),3)` works as a filter. `1/(index(a,,y)` will produce an error (`1/0`) and the `3` in TOROW ignores errors/blanks. I needed to produce the error, since the formula generated blanks are not true blanks. – P.b Jun 12 '23 at 07:15
  • For the upper/lower I need to dive into it deeper. I think I need to involve EXACT/MMULT and makes it a bit more complicated. – P.b Jun 12 '23 at 07:17
  • Edited my answer to show the upper/lower uniques solution. – P.b Jun 12 '23 at 08:15