Randomising a list whilst ensuring duplicates are not sequential

Question

I am looking for an efficient mechanism of randomizing the order of a list with the caveat that duplicate items must have at least 9 items between them, if the length of the list allows, otherwise they must be as far apart as the length of the list will allow.

Firstly this is only for small data-sets there will never be more than 150 items in a list and there could be as few as 1.

So the basic premise is this:

Take a list of peoples names that may contain a few duplicates (typically there will never be more than 2 instances of any given name but there could be 3 in exceptional circumstances)
Randomize the list (I have this part working already using a Fisher-Yates (or Knuth shuffle))
If the list contains duplicates identify those that have less than 9 items between them and adjust if possible to make the gap at least 9 items.

Part 3 is the tricky bit, I have no guidelines on any business rules on how the system should ensure the spacing is correct just that it should be so where possible. At its simplest level an iterative loop that checks for violations and then moves a list element as appropriate would seem the way to go however I can imagine several scenarios where an adjustment made for one pairing then causes an issue with another and so on.

I am not looking for someone to write the code for this, I'm just really after some sensible ideas on a good, efficient way to approach this problem.

The source list will be a IList<string> in C# for reference.

see this http://stackoverflow.com/questions/693880/create-random-number-sequence-with-no-repeats — therealprashant, Apr 22 '15 at 12:11
That link is different than my question, I am looking for an algorithm to space elements out after the list has been randomized - obviously if it can be done as part of the randomizing operation (pseudo random) all the better — connectedsoftware, Apr 22 '15 at 12:15
Of the 150 names in a typical instance of your list how many will be duplicates of other names in the same list ? — High Performance Mark, Apr 22 '15 at 12:26
@HighPerformanceMark In a list of 150 typically around 20 names are duplicated so there would be 130 unique names. — connectedsoftware, Apr 22 '15 at 12:29
So randomise the unduplicated names, insert the duplicates at the intervals you want. You're trying to balance competing objectives of randomness and spacing-out of duplicates, so this might be good enough. — High Performance Mark, Apr 22 '15 at 12:32
@HighPerformanceMark 100% regarding competing objectives, I am just trying to make it as random as is feasibly possible. — connectedsoftware, Apr 22 '15 at 12:35
So randomise the list of all unduplicated names plus one instance of each duplicated name. Then, at a random offset from the existing instances, insert each duplicate. If you get to a situation where this algorithm fails, which will happen rarely given what you have already told us, start again. — High Performance Mark, Apr 22 '15 at 12:40
@HighPerformanceMark now that approach I really like, effective but pragmatic - thanks. — connectedsoftware, Apr 22 '15 at 12:54

jdweng · Accepted Answer · 2015-04-23T07:25:06.017

Try code below. I group the strings together using Linq to get all the duplicates. I then create a random list of unique strings. Then add the remaining duplicates to the list spreading the string out evenly. The results is completely random and evenly spread.

Note : I found a minor error. Change line below

from : firstItem += spacing;
to : firstItem += spacing + 1;

While debugging the code I found firstItem was occasionally going negative so I added code to make sure firstItem was always positive. I then started thinking why I didn't get any overflows where firstItem was larger than the array size. That is when I realized I had to added 1 to spacing. The old code with an array A,B,C,D,E would give 1,1,1,1,1,A,B,C,D,E. New code will give 1,A,1,B,1,C,1,D,1,E.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication20
{
    class Program
    {
        static void Main(string[] args)
        {
            Random rand = new Random();
            List<string> input = new List<string>();
            List<string> output = new List<string>();
            //create 150 random strings with duplicates
            for(int i = 0; i < 150; i++)
            {
                input.Add(rand.Next(0,25).ToString());
            }
            //create dictionary with two columns key, number of entries
            Dictionary<string, Value> dict = input.AsEnumerable()
                .GroupBy(x => x)
                .ToDictionary(x => x.Key, y => new Value { count = y.Count(), ranNumber = rand.Next() });

            dict = dict.OrderBy(x => x.Value.ranNumber).ToDictionary(x => x.Key, y => y.Value);
            //add 1 sorted numbers to output
            foreach(string key in dict.Keys)
            {
                output.Add(key);
            }
            //add rest of numbers
            foreach (string key in dict.Keys)
            {
                int numberOfItems = dict[key].count;
                if (dict[key].count > 1)
                {
                    int arraySize = output.Count;
                    int spacing = arraySize / numberOfItems;
                    int firstItem = 0;
                    //center around middle
                    if (numberOfItems % 2 == 0)
                    {
                        firstItem = (arraySize / 2) - (((numberOfItems / 2) * spacing) + (spacing / 2));
                    }
                    else
                    {
                        firstItem = (arraySize / 2) - (((numberOfItems - 1) / 2) * spacing);
                    }
                    if (firstItem < 0)
                    {
                        firstItem = 0;
                    }
                    //remove existing item
                    output.Remove(key);
                    //insert items
                    for (int i = 0; i < numberOfItems; i++)
                    {
                        output.Insert(firstItem,key);
                        firstItem += spacing;
                    }
                }
            }

            
        }
        public class Value
        {
            public int count { get; set; }
            public int ranNumber { get; set; }
        }

    }
}

I've worked through your example code and I must say it looks excellent, I am actually going to try it with some "real" data and will let you know how it fares — connectedsoftware, Apr 22 '15 at 20:18
Excellent solution - thank you, I've implemented it and running through several hundred tests on real data it works as required. — connectedsoftware, Apr 23 '15 at 08:42
@jdweng Yes I did in order to correct that bug - thanks for the update — connectedsoftware, Apr 26 '15 at 11:27

DrKoch · Answer 2 · 2015-04-22T12:32:10.607

0

Try this:

Pass #0 Shuffle all names

Pass #1: find out which names exist more than once.

Pass #2: for each multiple-name find its closest pairing, exchange this name with a name which is 9 positions away if it is not a name known as a multiple-name. Exchange with name 10 positions away in such a case (repeat and increase distance)

Of course you need to pay attention if a name is close to start or end of the list and handle this adequately.

edited Apr 22 '15 at 12:32

answered Apr 22 '15 at 12:31

DrKoch

9,556
2
34
43

1

@vib we start with a "randomized list", then apply the constraints. This leaves the list as "random" as possible – DrKoch Apr 22 '15 at 12:33
Maybe I don't understand your pass 2 but I suspect you will overweight cases where duplicated items are at distance between 9 and 18. – vib Apr 22 '15 at 12:36
This approach is certainly an option, I had initially thought about moving entries either one position "up" or one position "down" to create the required spacing, your approach would reduce the number of actions required. – connectedsoftware Apr 22 '15 at 12:36
If you do that you lose uniformity. Instances where duplicated objects are close will be overweighted because they will occur either as output of your first permutation (pass 0) or as output or an invalid pass 0 followed by the corrective step. – vib Apr 22 '15 at 12:39

score 0 · Answer 3 · answered Apr 22 '15 at 13:18

Here's how I'd probably do it:

Remove all duplicates leaving one copy of each item, but keep track of how many of each you remove.
Randomize list of unique entries
For each duplicate removed in step 1:
1. Locate duplicated item(s) in list.
2. Split list into parts at each position ± 9, ignoring zero or negative length partitions.
3. Determine which partition that doesn't contain a duplicate is largest.
4. Insert duplicate at random position in that partition.

This will get tricky if you have tripples (three times the same name) — DrKoch, Apr 22 '15 at 13:36

Randomising a list whilst ensuring duplicates are not sequential

3 Answers3

Linked