Shuffle string array without duplicates

Question

I am using the Knuth-Fisher-Yates algorithm to display a shuffled array of string items on a windows form. I do not get any duplicates, which is what I was trying to achieve, however, it only spits out 12 of the 13 elements of the array. How can I get it to display all 13 elements of the array? Here is my code:

private void FormBlue1_Load(object sender, EventArgs e)
{
// set the forms borderstyle
this.FormBorderStyle = FormBorderStyle.Fixed3D;

// create array of stationOneParts to display on form
string[] stationOneParts = new string[13];
stationOneParts[0] = "20-packing";
stationOneParts[1] = "5269-stempad";
stationOneParts[2] = "5112-freeze plug";
stationOneParts[3] = "2644-o'ring";
stationOneParts[4] = "5347-stem";
stationOneParts[5] = "4350-top packing";
stationOneParts[6] = "5084-3n1 body";
stationOneParts[7] = "4472-packing washer";
stationOneParts[8] = "3744-vr valve o'ring";
stationOneParts[9] = "2061-packing spring";
stationOneParts[10] = "2037-packing nut";
stationOneParts[11] = "2015-latch ring";
stationOneParts[12] = "stem assembly";

Random parts = new Random();

// loop through stationOneParts using a Swap method to shuffle
labelBlueOne.Text = "\n";
for (int i = stationOneParts.Length - 1; i > 0; i--)
{
int j = parts.Next(i + 1);
Swap(ref stationOneParts[i], ref stationOneParts[j]);

// display in a random order
labelBlueOne.Text += stationOneParts[i] + "\n";
}
}

private void Swap(ref string firstElement, ref string secondElement)
{
string temp = firstElement;
firstElement = secondElement;
secondElement = temp;
}

score 2 · Answer 1 · answered May 31 '11 at 13:35

2

You don't access the first element. for (int i = stationOneParts.Length - 1; i >= 0; i--).

answered May 31 '11 at 13:35

001

13,291
5
35
66

Thanks. That was so simple I never would have seen it. I am quite the novice at this stuff. – James May 31 '11 at 23:50

Guffa · Answer 2 · 2011-06-01T06:19:34.240

2

As you are showing the texts using the loop that swaps the items, you will not show the last item, because it's never swapped by itself.

Just show the last item after the loop:

labelBlueOne.Text += stationOneParts[0] + "\n";

Alternatively, you can display all the items outside the loop that shuffles them:

for (int i = stationOneParts.Length - 1; i > 0; i--) {
  Swap(ref stationOneParts[i], ref stationOneParts[parts.Next(i + 1)]);
}
labelBlueOne.Text = "\n" + String.Join("\n", stationOneParts);

edited Jun 01 '11 at 06:19

answered May 31 '11 at 13:36

Guffa

687,336
108
737
1,005

This works. but it was easier and cleaner to change my loop condition to (i >= 0). Thanks though. – James May 31 '11 at 23:51
@James: If you change the loop condition, you will do an extra swap. The last item will be swapped with the item at `parts.Next(1)`, i.e. a random number between 0 and 0, which incidentally is always the item itself. This works, but it's a bit kludgy. I think that the cleanest solution would be to display the items outside of the loop that shuffles them; I added that alternative above. – Guffa Jun 01 '11 at 06:18
Ok. I'll experiment with that as well. Thanks – James Jun 01 '11 at 22:12
Wow. That works. I did not know you could do something like that. This String.Join thing is new to me. I just started learning C# about 6 or 7 months ago. I am factory worker who just fiddles with this stuff as a hobby. – James Jun 01 '11 at 22:20
Wow. I looked up string.join method online. What a cool thing. I did something in java a while back and that would have helped a lot. Does java have a similar static method. – James Jun 01 '11 at 22:40
@James: Actually, to my surprise it seems that Java doesn't have a Join method. – Guffa Jun 01 '11 at 23:12

score 1 · Answer 3 · answered May 31 '11 at 13:36

1

Change your loop condition to i >= 0.

answered May 31 '11 at 13:36

Daniel Hilgarth

171,043
40
335
443

Thanks. That was so simple I never would have seen it. I am quite the novice at this stuff. – James May 31 '11 at 23:49

score 1 · Answer 4 · answered May 31 '11 at 13:38

1

Simpliest approach :

        Random rnd = new Random();
        var stationOneParts = new List<string>{
        "20-packing",
        "5269-stempad",
        "5112-freeze plug",
        "2644-o'ring",
        "5347-stem",
        "4350-top packing",
        "5084-3n1 body",
        "4472-packing washer",
        "3744-vr valve o'ring",
        "2061-packing spring",
        "2037-packing nut",
        "2015-latch ring",
        "stem assembly"}.OrderBy(s => rnd.Next());

        labelBlueOne.Text = string.Join(Environment.NewLine, stationOneParts);

answered May 31 '11 at 13:38

Steve B

36,818
21
101
174

That's random-ish, but not guaranteed to give a uniform distribution over the possible orderings. – recursive May 31 '11 at 13:45
@recursive: Could you explain (1) what the source of the bias is, and (2) why it is important in this application to get a non-biased ordering? – Eric Lippert May 31 '11 at 15:29
@Eric: I can't answer number 2 because there isn't enough information in the original post to know why James is doing this or what his requirements are. But it seemed reasonable to assume that because he mentioned Knuth-Fisher-Yates, he cared about distribution uniformity. The source of the bias would be the sort algorithm used by `OrderBy` – recursive May 31 '11 at 15:35
@recursive: There is a bias here, but it is not from the sort algorithm; how the sort algorithm chooses to sort floats into order is irrelevant. The actual bias source is far more fundamental than that. – Eric Lippert May 31 '11 at 15:52
1

@recursive: the actual source of bias is that there are only 2^32 = 4 x 10^9 possible seeds to the RNG, but there are 13! = 6 x 10^9 possible orderings of 13 elements. Therefore there are 2 billion possible final configurations that are *never* generated. If this is a problem then you need a stronger source of randomness. (Moreover: the default RNG seed is the current time, which introduces additional bias as the code is more likely to be run during business hours than not.) – Eric Lippert May 31 '11 at 15:54
@Eric: You're right of course. For some reason, I was thinking that new random numbers would be generated for each comparison, in which case the algorithm would matter. But I was still accidentally correct about the existence of a bias. The fact that an evenly distributing algorithm was mentioned suggests that it may be a concern. – recursive May 31 '11 at 18:40
If you want to sort by random you should pick one random value for each item, not create a new random value each time an item is compared. Some sorting algorithms act up when you feed them inconsistent values. – Guffa May 31 '11 at 18:48
1

@Guffa, @recursive: You are right to be concerned but in this case it is not a problem. The code given here is correct, modulo the other issues. When you do an "OrderBy" like this, the sorting code obtains the sort key *once* for each item, caches the key, and does not recompute the key. After all, the key computation might be expensive! The issue you are thinking of is *non-key-based comparison sorts with randomness in the comparator*. A comparison sort is required to have a consistent total ordering imposed by the comparator. – Eric Lippert May 31 '11 at 20:12
1

@Guffa, @recursive: That is to say, the algorithm that you are actually criticizing is this one: "myList.Sort((x, y) => random.Next(-1, 2));" -- that really is deeply wrong. There will be a new comparison done on every compared pair, and there could be n log n or even n^2 of those comparisons done, and the sort algorithm may require the comparator to be consistent. But in the algorithm that Steve gives, he generates n sort keys and that's it; we don't regenerate the keys every time a comparison needs to be done. – Eric Lippert May 31 '11 at 20:16
1

@Eric Lippert - In that case the order-by does introduce a bias, since it is a stable sort and will preserve the original order should any of the keys happen to be the same. That is, there is a rare bias toward maintaining the original order. – Jeffrey L Whitledge May 31 '11 at 20:22
@Jeffrey: Sure. By the Birthday Paradox if the list has ten thousand items in it then there is a 1% chance of there being at least one collision in the key set. But that source of bias is very, very small. The non-randomness of the pseudo-random number generator used in the first place gives much larger bias. Steve's suggestion works quite well in practice, particularly for small lists. But do not use it for anything where an actual unbiased shuffle is required; this would not pass muster for an online poker hand generator, for example. – Eric Lippert May 31 '11 at 20:28
You guys are way over my head at this point. I am just a novice doing a little work simulation of my factory. The code I gave was for station one blue cell assembly where I work. I want the part to be assembled spit out to the form in a different order every time you run the application, because right next to it are text boxes for the user to type in the answers in the correct order of assembly, and then the information is sent as arg to a class constructor and checked if they are in the correct order. There are 3 more stations in blue cell and that was my last loose end before I move on. – James Jun 01 '11 at 00:01
What do you mean by bias? I know it's a math term. I think? – James Jun 01 '11 at 00:07
@James: There are n-factorial (n!) ways to shuffle n items; a "biased" shuffle is one that produces some of those orderings more often than others. For example, suppose you had three items. The possible orders are ABC, ACB, BAC, BCA, CAB and CBA. If the shuffle produces ABC 20% of the time and the other five 16% of the time each then the shuffle is biased towards ABC. There are applications (like online poker) where it is extremely important to ensure that there is no bias; knowledge of how the shuffle algorithm is biased can give an edge to cheaters. – Eric Lippert Jun 01 '11 at 07:26
Ok. I get factorials. I see what you mean now. Thanks. – James Jun 01 '11 at 22:08

sehe · Answer 5 · 2011-06-01T07:57:34.663

0

Since you mention C# 4.0, why not write is C#-ish?

using System.Linq;

// ...

    var stationOneParts = new [] { "20-packing",
        "5269-stempad",
        "5112-freeze plug",
        "2644-o'ring",
        "5347-stem",
        "4350-top packing",
        "5084-3n1 body",
        "4472-packing washer",
        "3744-vr valve o'ring",
        "2061-packing spring",
        "2037-packing nut",
        "2015-latch ring",
        "stem assembly" };

    Random rand = new Random();
    stationOneParts = stationOneParts
        .Distinct() // see subject: '... without duplicates'
        .Select(i => new { i, key=rand.Next() })
        .OrderBy(p => p.key)
        .Select(p => p.i)
        .ToArray();

edited Jun 01 '11 at 07:57

answered Jun 01 '11 at 06:47

sehe

374,641
47
450
633

1

Wow, that is a lot of unnecessary parts. "parts.OrderBy(x=>rand.Next());", and you're done. Why do you feel it necessary to add a Distinct, two Selects and a ToArray to this? – Eric Lippert Jun 01 '11 at 07:30
@Eric: depending on the sort algorithm used, when using rand.Next() directly, it may not finish at all (since the strict weak ordering predicate is stateful, it is not deterministic, and the sort algorithm may assume it is; if it is quicksort, it will finish anyhow, but it may degrade to O(n^2) complexity). Is there a flaw in my reasoning? – sehe Jun 01 '11 at 07:48
@sehe: I don't really understand what you're saying, but I suspect that you have the same misapprehension that Guffa and recursive did in Steve's answer; see my comments to that answer for a lengthy discussion of this practice and why it is safe. (Moreover: if it were not safe then your extra two Selects would not make it safe. I still don't see what purpose they serve.) – Eric Lippert Jun 01 '11 at 13:50
@Eric: if a sort algorithm happens to compare two elements twice (bear with me), it will hurt performance and possibly correctness if the predicate returns a different value the second time. (_I agree that it is unlikely, but because of .OrderBy supporting deferred execution a non-standard sorting approach might have been taken. I don't think the specification of [Enumerable.OrderBy](http://msdn.microsoft.com/en-us/library/bb549422.aspx) includes guarantees either way..._) – sehe Jun 01 '11 at 14:01
But you're not supplying a predicate for the comparison, you're supplying a *key* for the comparison. If you were supplying a predicate, like stuff.Sort((x,y)=>-1 or 1 at random) then yes, that can cause the sort algorithm to do crazy things. But that's not what you are supplying; you're supplying a key, one key per item, not per comparison. The OrderBy doesn't re-fetch the key on every comparison; that would be both wasteful and dangerous. We have to assume that the key is expensive to produce. You are right to be thoughtful about this issue, but really, it's fine to order by a random key. – Eric Lippert Jun 01 '11 at 14:07
But now at least I understand why the extra selects; you are using the fact that anonymous types are immutable tuples to ensure that the random value does not change per comparison. Clever, but unnecessary. – Eric Lippert Jun 01 '11 at 14:09
@Eric: subtle but convincing point well taken. I'd really expect that the documentation spell that out (_'Per enumeration, the `keySelector` will be invoked at most once for each element from the `source` enumerable, and exactly once for each element if the full sequence is enumerated'_ would do :)) – sehe Jun 01 '11 at 14:13
@Eric Lippert - Since you are a student of user-misapprehentions, I also assumed that OrderBy would execute the key retrieval function multiple times for each item. It's pretty awesome that it stores the key associations instead. I feel like that knowledge will greatly increase the circumstances in which I will use that method. – Jeffrey L Whitledge Jun 01 '11 at 14:14
@sehe, @Jeffrey: Good point. Next time I see the documentation manager I'll mention it. – Eric Lippert Jun 01 '11 at 14:15
@all: for general info, the mono sources (on quick inspection) seems to honour this rule, like MS implementation seems to [according to Jon Skeet's blog](http://msmvps.com/blogs/jon_skeet/archive/2011/01/05/reimplementing-linq-to-objects-part-26b-orderby-descending-thenby-descending.aspx). However, Jon mentions: `I'm not advocating that way of shuffling, admittedly` (I think I'll ask him why) and `I believe the code will obey all the documented behaviour of LINQ to Objects: there's nothing in the documentation about how often the key selector will be called` – sehe Jun 01 '11 at 14:35

Shuffle string array without duplicates

5 Answers5

Linked