-3

I have a problem

I need to get random numbers, but except numbers that I already have.

My Code:

List<int> current_numbers = repository.GetCurrentNumbers();            
Random rnd = new Random(42);
var new_values = Enumerable.Range(10000000,99999999)
    .Except(current_numbers)
    .OrderBy(o=> rnd.Next())
    .Take(amount)
    .ToList();

But this code is VERY SLOWLY

When I tried to use select instead OrderBy - I got DUPLICATES. In my case, its must be without duplicates.

UPDATED: With OrderBy -- I have problem with memory :)

Range must be like this 1M - 99M Thank you.

juharr
  • 31,741
  • 4
  • 58
  • 93
Max A
  • 21
  • 4
  • How many numbers do you need in the list? 42? – LarsTech Jul 24 '17 at 17:57
  • 2
    I don't know C#, but the typical way to do this is just put the available numbers into a list, then either shuffle the list and pop the first element, or just randomly draw from it. The later should be `O(1)` access if you're using a structure that supports that, so it should be fast. – Carcigenicate Jul 24 '17 at 17:57
  • What do you mean without duplicates? Computers can store finite values so there is no way to ensure true uniqueness. There are just close approximations like GUID. – Marko Gresak Jul 24 '17 at 17:58
  • I dont have available nubers. Its all numbers in this range. I need **amount** count. In random - just put 42. Random can be with empty constructior – Max A Jul 24 '17 at 17:59
  • https://stackoverflow.com/questions/273313/randomize-a-listt – Daniel A. White Jul 24 '17 at 17:59
  • try a different strategy. get your seed (10000000), and then for x times (42?), get a new random number, and add it to the seed. repeat x (42?) times. you won't have duplicates. – 2174714 Jul 24 '17 at 18:00
  • have a look at `HashSet`. – Daniel A. White Jul 24 '17 at 18:01
  • What is the purpose of this `Enumerable.Range(10000000,99999999)` in your code? – hatchet - done with SOverflow Jul 24 '17 at 18:09
  • i need big number . – Max A Jul 24 '17 at 18:10
  • Well, `Enumerable.Range` is what's killing you. – hatchet - done with SOverflow Jul 24 '17 at 18:15
  • 1
    @hatchet More specifically it's likely the `Except` call. `Enumerable.Range` doesn't use much memory by itself since. However, `Except` creates an underlying hash set that it adds all the elements into, so the memory usage there will explode. Of course that is only happening because the OP is asking for 100 million integers sans the duplicates. – Kyle Jul 24 '17 at 18:46
  • How many numbers are you going to want to get out of your candidate range? Whether you are getting a couple or whether you want to exhaust the list can make a big difference to what the most practical way of doing this is. Also how random do they need to be? – Chris Jul 24 '17 at 18:50
  • @Kyle - right...OP's code is doing things 90 million times to generate a handful of random numbers. `Enumerable.Range` is not what they want. They should be limiting the range of the random numbers by using the version of `Next` that takes a min and max value instead. – hatchet - done with SOverflow Jul 24 '17 at 18:57

3 Answers3

0

Use a HashSet<T> instead of a List and then test using Contains - if you take a peek at Reference Source, you'll note that Except will build those existing numbers into a Set.

Since OrderBy tries to sort the whole collection, you're losing the benefits of lazily executed enumerables by using OrderBy at all - instead, using a regular loop and generate random numbers

var random = new Random();  // Default constructor or you'll get the same sequence because of a constant seed
var result = new HashSet<int>();
var currentNumbers = new HashSet<int>(repository.GetCurrentNumbers());

while(result.Count < amount)
{
    var next = random.Next(1000000,99000000);
    if(currentNumbers.Contains(next)) continue;

    result.Add(next);
}

Or write you own generator

public static IEnumerable<int> GenerateRandom()
{
    var random = new Random();
    while(true) { yield return random.Next(1000000,99000000); }
}
// Later...
var newValues = MyClass.GenerateRandom()
    .Where(next => !currentNumbers.Contains(next))
    .Distinct()
    .Take(amount)
    .ToList();
David
  • 10,458
  • 1
  • 28
  • 40
0

Since you want numbers from such a large range, you probably want to use a HashSet to eliminate duplicates.

HashSet<int> current_numbers = new HashSet<int>(repository.GetCurrentNumbers());
HashSet<int> newValues = new HashSet<int>();
while (newValues.Count < amount)
{
    var next = rnd.Next(10000000,99999999);
    if (!current_numbers.Contains(next))
        newValues.Add(next);
}

Converting current_numbers to a HashSet will speed up the process, because a call to Contains would take O(n) time if current_numbers is not stored as a HashSet.

yinnonsanders
  • 1,831
  • 11
  • 28
  • 1
    This will result in a set of combined new AND current results, rather than a separate collection of new results – David Jul 24 '17 at 18:12
0

To avoid creating that huge list of numbers you can instead keep track of the numbers you have and randomly pick where the next number should come from. First you need an ordered list of the used number. Then add the lower and upper bounds to it. Then keep track of the index of the lower and upper bounds. Then iterate until you have enough number and each time randomly pick an index between the lower and upper bound indexes. Check that the difference between the number at that index and the next is 1 and if so increment the index until it isn't or you hit the upper bound. If the upper bound was hit then just walk the upper bound down and try again. When you do find a gap in the used numbers randomly pick a number in the gap and add it to your return list and to the used list at the proper index. Then make sure to walk the lower bound index up if needed.

var used = repository.GetCurrentNumbers().OrderBy(x => x).ToList();
used.InsertAt(0, 999999) // This is the lower bounds;
used.Add(100000000); // This is the upper bounds;
var random = new Random();

var values = new List<int>();
int lowerIndex = 0; 
int upperIndex = used.Length - 1;
while(values.Count < amount) {
    int ind = random.Next(lowerIndex, upperIndex); 
    while(ind < upperIndex && used[ind+1] - used[ind] == 1) ind++;
    if(ind == upperIndex){
        while(used[upperIndex] - used[upperIndex-1] == 1) upperIndex--;
        continue;
    }

    int val = random.Next(used[ind]+1, used[ind+1]);
    values.Add(val);
    used.InsertAt(ind+1, val);

    while(used[lowerIndx+1] - used[lowerIndex] == 1) lowerIndex++;
}

This works best if amount isn't a very large number and your overall range is large and the initial used numbers is also sparse.

juharr
  • 31,741
  • 4
  • 58
  • 93