How can I make a random set generator with equal possibility to generate uniform sets and non-uniform sets?

Question

So let's say I have a set of items:

['a', 'b', 'c', 'd', 'e']

and I want to generate a random set (order does not matter) from those choices:

['a', 'e', 'd', 'c']

which is child's play, but instead of it being unlikely to generate a uniform result:

['c', 'c', 'c', 'c']

compared to something less uniform like:

['a', 'b', 'e', 'd']

I want to make it equally likely that a uniform set can be generated as it is likely that a non-uniform set can be generated.

Edit:

The result I'm trying to express is not just ['c', 'c', 'c', 'c', 'c', 'c'] or ['d', 'd', 'd', 'd', 'd', 'd'] but also the areas in between those uniformities like ['c', 'c', 'c', 'c', 'c', 'a'] or ['d', 'd', 'd', 'd', 'd', 'b'] or ['c', 'c', 'c', 'c', 'b', 'a']. Making all of those uniform sets and the areas in-between equally likely as non-uniform results is what I find challenging to create. I'm at a loss for where to even begin creating a set generator that does that.

Further clarification:

So if I generate a set of 1000 items, I want it to be equally likely that the set is 90% uniform or 100% uniform or 80% uniform or 20% uniform.

How can/should this be done?

I think you're misunderstanding something about the nature of randomness. `['c', 'c', 'c', 'c']` is in fact already _exactly_ as likely to be generated as `['a', 'b', 'e', 'd']` — Hamms, Mar 30 '17 at 22:24
Well, he does say "equally likely that a uniform vs non-uniform set is generated" which is precise enough. — Kenan Banks, Mar 30 '17 at 22:25
Just to mention a use-case for this in case anyone is confused as to why you would ever want to do this: I'm working on a neural network which mutates itself to evolve based on a set of mutations, but with a normal random set generator, the mutation list will be very unlikely to ever generate a an extreme set of mutations as shown above, especially if you're doing set sizes of 1000s, so if the network happens to need to undergo an extreme mutation, like 1000 x "add neuron" and no other mutations, that wont ever happen and it'll never evolve. — , Mar 30 '17 at 22:27
Although as stated about every set is as likely as another. If you want there to be for example, a 50/50 chance to produce an 'extreme' set as a non-extreme set, you could group each set of sets (extreme and nonextreme) and roll a random boolean to see which out of each set of sets you pick a random set from. — Barney Chambers, Mar 30 '17 at 22:28
@Hamms yeah I understand that, I'm not referring to those specific combinations, I'm referring to to the categories of "extreme" and "non extreme", and mixes between the two. — , Mar 30 '17 at 22:30
@BarneyChambers I'm having trouble visualizing how that would reach my goal of a dynamic possibility. Maybe Im not being specific enough, hold on. — , Mar 30 '17 at 22:31
The result I'm trying to express is not just `['c', 'c', 'c', 'c', 'c', 'c']` or `['d', 'd', 'd', 'd', 'd', 'd']` but also the areas in between those extremes like `['c', 'c', 'c', 'c', 'c', 'a']` or `['d', 'd', 'd', 'd', 'd', 'b']` or `['c', 'c', 'c', 'c', 'b', 'a']` - making all of those extremes and the areas in-between equally likely as non-extremes is what I find challenging to create. — , Mar 30 '17 at 22:34
Ah - we're back to underspecified then. You need a much more precise description of what "extreme" means. Is `['a', 'b', 'a', 'b', 'a']` half as extreme as all A's or not extreme at all? — Kenan Banks, Mar 30 '17 at 22:42
@Triptych I'm referring to a full range of "extreme": between not extreme (no uniformity) to absolutely extreme (totally uniform). I'll replace the word extreme with uniform in my question clarify. — , Mar 30 '17 at 22:51
How exactly are you defining these percentages of uniformity you're talking about? Is `aabb` 50% uniform? What about `aabc`? What about `abcc`? What does it mean for something to be 20% uniform? Is `abcd` 20% uniform, or is it 0% uniform? — Hamms, Mar 30 '17 at 23:24
@Hamms you've got it right. Except in larger sets depending on how many options there are 0% uniform might not be possible. For example in a set length of 5 with 4 options, the lowest possible uniformity is going to be 40% (at least 2/5 items in the set will always the be same). And to answer your question `abcd` is 0% uniform, since there are no repetitions. — , Mar 31 '17 at 00:36

score 1 · Answer 1 · edited May 23 '17 at 12:10

From what you're saying, you want to ignore the order of the elements in your random set, so if your original set was ab then the possible outcomes (ignoring order) would be aa, ab, bb, and you'd like to see each of those appearing with equal probability (of 1/3), no?

A brute-force solution to this would be:

generate all outcomes (see Finding All Combinations of JavaScript array values),
sort each of the results so the elements appear alphabetically,
remove duplicates (see Remove duplicates from an array of objects in javascript)
select one of the remaining results at random

So for example, with abc:

all combinations =   [`aaa`, `aab`, `aac`
                      `aba`, `abb`, `abc`
                      `aca`, `acb`, `acc`
                      `baa`, `bab`, `bac`
                      `bba`, `bbb`, `bbc`
                      `bca`, `bcb`, `bcc`
                      `caa`, `cab`, `cac`
                      `cba`, `cbb`, `cbc`
                      `cca`, `ccb`, `ccc`]

sorted combinations = [`aaa`, `aab`, `aac`
                       `aab`, `abb`, `abc`
                       `aac`, `abc`, `acc`
                       `aab`, `abb`, `abc`
                       `abb`, `bbb`, `bbc`
                       `abc`, `bbc`, `bcc`
                       `aac`, `abc`, `acc`
                       `abc`, `bbc`, `bcc`
                       `acc`, `bcc`, `ccc`]

remove duplicates  = [`aaa`, `aab`, `aac`,
                      `abb`, `abc`, `acc`,
                      `bbb`, `bbc`, `bcc`,
                      `ccc`]

then choose from these with equal probability of 1/10

EDIT The final output above gives a clue to a non-brute-force solution: for each item the letters that follow each letter are of equal or 'higher' value (alphabetically speaking). So 'b' will never be followed by 'a', and 'c' will never be followed by 'a' or 'b' and so on.

So we can recursively generate all the combinations like this (sorry it's in python, you'll have to translate to javascript):

r=['a','b','c','d']

def get_combos(bound, n):
  global r
  if n == 1:
    return r[bound:]
  result=[]
  for i in range(bound,len(r)):
    for combo in get_combos(i, n-1):
      result.append(r[i]+combo)
  return result

x = get_combos(0,len(r))
print(x) # ['aaaa', 'aaab', 'aaac', 'aaad', 'aabb', 'aabc', 'aabd', 'aacc', 'aacd', 'aadd', 'abbb', 'abbc', 'abbd', 'abcc', 'abcd', 'abdd', 'accc', 'accd', 'acdd', 'addd', 'bbbb', 'bbbc', 'bbbd', 'bbcc', 'bbcd', 'bbdd', 'bccc', 'bccd', 'bcdd', 'bddd', 'cccc', 'cccd', 'ccdd', 'cddd', 'dddd']

print(len(x)) # 35

Just checking: In the past few minutes I clarified / specified my target result further in the question. Did you notice the update? — , Mar 30 '17 at 22:38
@Viziionary This answers the question from the get go. And IMO, although it is brute force, I think it may be the only way to actually accomplish what you are asking for. — mhodges, Mar 30 '17 at 22:44
I remember once trying to generate all combinations of 50 from a 500 card deck and someone did the math and showed me the sun would explode before I got my result due to there being `2.3144228279843E+69` possible results. Wouldn't it be the same case here? — , Mar 30 '17 at 23:05
The number of combinations will rise pretty quickly: http://math.stackexchange.com/questions/2042845/how-many-packages-of-ice-cream-can-you-make-out-of-6-flavors?rq=1 — Richard Inglis, Mar 30 '17 at 23:11

JeremiahB · Answer 2 · 2017-03-30T22:37:59.240

0

This can in fact be done. Just get a random number, and if it is over 0.5 then generate a random set, otherwise generate an extreme set from a random index. See the following code:

function generateRandomOrExtremeSet(a) {
   var n = Math.random();
   var set = [];
   if (n > 0.5)
      for (var i = 0; i < a.length; ++i)
         set[i] = a[Math.round(Math.random()*(a.length-1))];
   else {
      var index  = Math.round(Math.random() * (a.length-1));
      for (var i = 0; i < a.length; ++i) {
         if (Math.random() > 0.8) // change to adjust extremeness
            set[i] = a[Math.round(Math.random()*(a.length-1))];
         else
            set[i] = a[index];
      }
   }
   return set;  
}

edited Mar 30 '17 at 22:37

answered Mar 30 '17 at 22:32

JeremiahB

896
8
15

Just checking: In the past few minutes I clarified / specified my target result further in the question. Did you notice the update? – Mar 30 '17 at 22:38
That's not equally as likely... because if the number is higher than .5 and you generate a random set, the random set can still be uniform - meaning there is a > 50% chance that a uniform set will be generated – mhodges Mar 30 '17 at 22:41
@mhodges There is also a chance of the uniform generator creating a random list. While I agree that the algorithm can not be guaranteed to be 50-50 it should be pretty close and for most applications involving random data that will be enough. – JeremiahB Mar 31 '17 at 04:33

Kenan Banks · Answer 3 · 2017-03-30T22:40:36.570

0

Here's a simple snippet that gets it done, using the strategy of first deciding whether to generate an extreme or non-extreme set.

const choices = ['a', 'b', 'c', 'd', 'e']

const repeatN = (times, x) => {
  const ret = [];
  while (times--) {
    ret.push(x);
  }
  return ret;
}

const chooseN = (n, choices) => {
  let list = choices.slice();
  let ret = [];
  while (n-- && list.length) {
    let i = Math.floor(Math.random() * list.length);
    ret.push(list[i]);
  }
  return ret;
};

const set = Math.random() > 0.5 ? 
  chooseN(5, choices) :
  repeatN(5, chooseN(1, choices)[0]);

console.log(set);

edited Mar 30 '17 at 22:40

answered Mar 30 '17 at 22:38

Kenan Banks

207,056
34
155
173

Just checking: In the past few minutes I clarified / specified my target result further in the question. Did you notice the update? – Mar 30 '17 at 22:39

score 0 · Answer 4 · answered Mar 31 '17 at 02:17

Your original question seems to be stated incorrectly, which is why you are getting so many incorrect responses. In fact, the simple approach will give you the result that you want. You should do this by choosing a random value from your original set, like this:

function randomSet(set, size) {
  let result = []
  for (let i = 0; i < size; i++) {
    // get a random index from the original set
    let index = Math.floor(Math.random() * set.length)
    result.push(set[index])
  }
  return result
}

console.log(randomSet(['a', 'b', 'c'], 3))

// These are all equally likely:
// ['a','a','a']
// ['a','b','b']
// ['a','b','c']
// ['c','b','a']
// ['a','b','a']
// ['a','a','b']
// ['b','a','a']
// ['b','a','b']
// ['b','b','b']
// etc.

Alternatively, it's possible that you are misunderstanding the definition of a set. Many of your examples like ['c', 'c', 'c', 'c', 'b', 'a'] are not sets, because they contain repeat characters. Proper sets cannot contain repeat characters and the order of their contents does not matter. If you want to generate a random set from your initial set (in other words, generate a subset), you can do that by picking a size less than or equal to your initial set size, and filling a new set of that size with random elements from the initial set:

function randomSet(set) {
  let result = []
  let size = Math.floor(Math.random() * set.length)
  while(result.length !== size) {
    // get a random index from the original set
    let index = Math.floor(Math.random() * set.length)
    // in this case, we construct the new set simply by removing items from the original set
    result.splice(index, 1)
  }
  return result
}

console.log(randomSet(['a', 'b', 'c']))

// These are all equally likely:
// ['a','b','c']
// ['a','b']
// ['b','c']
// ['a','c']
// ['a']
// ['b']
// ['c']
// no other sets are possible

I wasn't referring to any particular technical usage of the term "set". Rather I was referring to the generic definition of set. As in a "set of items". Am I wrong? Can "set" not be used to describe any group of items regardless of whether or not there are duplicates? Either way it looks like you based your answer on the (your?) technical definition of "set" when, with my elaboration, I made it very clear what my intended result was, and it's not what your answer does at all. So even if I incorrectly used the word set, that would not have confused the question completely, since I gave examples — , Mar 31 '17 at 02:57
When you use a term that has a defined meaning, you shouldn't be surprised when people interpret the term as what it means. I've read the comments and the edits and it's still very unclear why exactly my first example won't work for you. I'm working at the moment but when I get off I'll try my hand at another option that might be what you want. — suddjian, Mar 31 '17 at 17:17
Your first example doesnt involve any code that deals with uniformity percentage. In a group size of 5 with an option set size of 4, will it output a 100% uniform result just as often as an 40% uniform result? I don't see how. — , Mar 31 '17 at 20:05

How can I make a random set generator with equal possibility to generate uniform sets and non-uniform sets?

4 Answers4