0

I have a table of data with many data repeating. I have to sort the rows by random, however, without having identical names next to each other, like shown here: enter image description here

How can I do that in Excel?

Nanno
  • 71
  • 1
  • 7
  • 1
    Take the left table, add a helper column with the following formula: `=COUNTIF(A$2:A2,A2)`, populate down then sort on that column ? – cybernetic.nomad Jan 26 '23 at 18:14
  • At first sight this is tricky by formula because you have to find a way of excluding a row once it has been added to the output. On further inspection it is very tricky because you could generate a perfectly reasonable sequence like mary,mike,mary,mike and then be unable to proceed because you are left with john,john,john which doesn't lead to a solution. So you would have to backtrack and explore a potentially large number of paths even for a small amount of data. How much data do you have and how 'random' does the output have to be? – Tom Sharpe Jan 26 '23 at 23:03
  • ^^^ and for some data sets there will be no solution – chris neilsen Jan 26 '23 at 23:19
  • @chrisneilsen , in theory, some data sets may have no solution. For my practical case, with a data set with 200 rows, one single name (john, mary, etc.) will repeat a maximum of 40 times, so always will have a solution. – Nanno Jan 27 '23 at 17:12
  • @TomSharpe, The real data set has about 200 rows and one single name (john, mary, etc.) will repeat a maximum of 40 times. Any level of randomness will be fine, provided identical names will not be next to each other. – Nanno Jan 27 '23 at 17:14
  • @Sukhoi that would be useful information to add to your question – chris neilsen Jan 27 '23 at 19:57

4 Answers4

3

Perfect case for a recursive LAMBDA.

In Name Manager, define RandomSort as

=LAMBDA(ζ,
    LET(
        ξ, SORTBY(ζ, RANDARRAY(ROWS(ζ))),
        λ, TAKE(ξ, , 1),
        κ, SUMPRODUCT(N(DROP(λ, -1) = DROP(λ, 1))),
        IF(κ = 0, ξ, RandomSort(ζ))
    )
)

then enter

=RandomSort(A2:B8)

within the worksheet somewhere. Replace A2:B8 - which should be your data excluding the headers - as required.

If no solution is possible then you will receive a #NUM! error. I didn't get round to adding a clause to determine whether a certain combination of names has a solution or not.

Jos Woolley
  • 8,564
  • 2
  • 4
  • 9
  • Your solution seems good, but seems lambda is only available for Excel 365 (I have Excel 2019) – Nanno Jan 27 '23 at 17:04
  • Great achievement to answer question with test data. However as I found here https://stackoverflow.com/questions/74359156/divide-a-value-in-excel-by-a-set-of-preset-values-to-find-out-how-many-of-each-a/74362385#74362385 these pretty soon blow up with a brute-force approach using recursive lambda so not really scalable beyond maybe 20 rows or so (depending on mix of data) – Tom Sharpe Jan 27 '23 at 18:01
  • @TomSharpe Of course, the number of permutations is obviously a factor. Though can I ask how you arrived at a limit of 20 rows? I tested on a 1000 rows' worth of data, varying the number of unique names within that range from 50 to 500, and received an instantaneous return in all cases. Could it be less to do with an issue with recursive `LAMBDA`s per se, and more to do with the way in which a given `LAMBDA` is designed? I notice that yours is calling `REDUCE`, for example, which might be a factor in its recursive efficiency? – Jos Woolley Jan 27 '23 at 19:33
  • I just copied OP's data a few times so a bit of a worst case scenario I guess as nearly half of them are dups of the same person – Tom Sharpe Jan 27 '23 at 20:20
  • 2
    @TomSharpe Interesting - [this link](https://www.sumproduct.com/news/article/lambda-formulaic-recursion-its-all-about-me) states that the recursion limit for `LAMBDA`s is 1024/x, where x is the number of parameters passed to `LAMBDA`. In my case I guess that would mean no more than 1024 recursions. One could probably work out the approximate probability of a given `LAMBDA` returning a solution within its maximum number of iterations. – Jos Woolley Jan 27 '23 at 20:52
2

This is just an attempt because the question might need clarification or more sample data to understand the actual scenario. The main idea is to generate a random list from the input, then distribute it evenly by names. This ensures no repetition of consecutive names, but this is not the only possible way of sorting (this problem may have multiple valid combinations), but this is a valid one. The solution is volatile (every time Excel recalculates, a new output is generated) because RANDARRAY is volatile function.

In cell D2, you can use the following formula:

=LET(rng, A2:B8, m, ROWS(rng), seq, SEQUENCE(m),
  idx, SORTBY(seq, RANDARRAY(m,,1,m, TRUE)), rRng, INDEX(rng, idx,{1,2}),
  names, INDEX(rRng,,1), nCnts, MAP(seq, LAMBDA(s, ROWS(FILTER(names,
   (names=INDEX(names,s)) * (seq<=s))))), SORTBY(rRng, nCnts))

Here is the output: excel output

Update

Looking at @JosWoolley approach. The generation of the random sorting can be simplified so that the resulting formula could be:

=LET(rng, A2:B8, m, ROWS(rng), seq, SEQUENCE(m), rRng,SORTBY(rng, RANDARRAY(m)), 
 names, TAKE(rRng,,1), nCnts, MAP(seq, LAMBDA(s, ROWS(FILTER(names,
  (names=INDEX(names,s)) * (seq<=s))))), SORTBY(rRng, nCnts))

Explanation

LET function is used for easy reading and composition. The name idx represents a random sequence of the input index positions. The name rRng, represents the input rng, but sorted by random. This sorting doesn't ensure consecutive names are distinct.

In order to ensure consecutive names are not repeated, we enumerate (nCnts) repeated names. We use a MAP for that. This is a similar idea provided by @cybernetic.nomad in the comment section, but adapted for an array version (we cannot use COUNTIF because it requires a range). Finally, we use SORTBY with input argument by_array, the map result (nCnts), to ensure names are evenly distributed so no consecutive names will be the same. Every time Excel recalculate you will get an output with the names distributed evenly in a different way.

David Leal
  • 6,373
  • 4
  • 29
  • 56
1

Not sure if it's worth posting this, but I might as well share the results of my research such as it is. The problem is similar to that of re-arranging the characters in a string so that no same characters are adjacent The method is just to insert whichever one of the remaining characters (names) has the highest frequency at this point and is not the same as the previous character, then reduce its frequency once it has been used. It's fairly easy to implement this in Excel, even in Excel 2019. So if the initial frequencies are in D2:D8 for convenience using Countif:

=COUNTIF(A$2:A$8,A2)

You can use this formula in (say) F2 and pull it down:

=INDEX(A$2:A$8,MATCH(MAX((D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1)),(D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1),0))

and similarly in G2 to get the ages:

=INDEX(B$2:B$8,MATCH(MAX((D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1)),(D$2:D$8-COUNTIF(F$1:F1,A$2:A$8))*(A$2:A$8<>F1),0))

enter image description here

I'm fairly sure this will always produce a correct result if one is possible.

HOWEVER there is no randomness built in to this method. You can see if I extend it to more data that in the first several rows the most common name simply alternates with the other two names:

enter image description here

Having said that, this is a bit of a worst case scenario (a lot of duplication) and it may not look too bad with real data, so it may be worth considering this approach along with the other two methods.

Tom Sharpe
  • 30,727
  • 5
  • 24
  • 37
  • Tom, I tested, but in my computer could not get it work. The problem started with the equation in F2, that for me generates `#N/A` result in all F the column. – Nanno Jan 30 '23 at 17:45
  • You may have to array-enter it in Excel 2019, using Ctrl-Shift-Enter. – Tom Sharpe Jan 30 '23 at 17:47
0

Thank you for all the answers and suggestions.

Unfortunately I could not adapt any of the proposed solutions to work with my real data, and I had to develop a solution. The solution is in Python because the Excel file is used as a database for a Python application.

You can see and comment the developed solution here:

Python: sorting a matrix of data (list of lists) to have equal "names" at the biggest distance of each other

Nanno
  • 71
  • 1
  • 7