0

I recently started in C# and am working on a program for which I need to drop n random values from a dictionary. I picked up some code from this previous q&a and added DropRandom to remove n values, but when I run DropRandom (say with keepN = 10) the dictionary doesn't drop directly to 10 values.

For example, from a total initial count of 100 with a desired final count of 10, the count might drop to 20. I can run DropRandom on the resulting Dictionary, and it will then drop to 12, and finally a third (or fourth or fifth...) iteration will take it down to a count of 10.

So my question is, what can I do to get DropRandom to immediately drop to the desired count?

The first part of things merely sets up the random keys (adapted from above mentioned q&a):

public static IEnumerable<TKey> RandomKeys<TKey, TValue>(IDictionary<TKey,TValue> dictionary)
{
    Random random = new Random();
    List<TKey> keys = Enumerable.ToList(dictionary.Keys);
    int size = dictionary.Count();

    while (true) //returns random order of keys from dictionary using infinite loop
    {
        yield return keys[random.Next(size)];
    }
}

The second part removes all but n values:

public static void DropRandom<TKey, TValue>(int keepN, ref Dictionary<TKey, TValue> dictionary)
{
    if(dictionary.Count() < keepN)
    {
        throw new Exception("Trying to keep more items than in dictionary");
    }

    Console.WriteLine("old dict size = " + dictionary.Count());
    Dictionary<TKey, TValue> tempDict = new Dictionary<TKey, TValue>();

    //inelegant way to get extra values...
    IEnumerable<TKey> randK = RandomKeys(dictionary).Take(keepN * 2);

    while (tempDict.Count() < keepN)
    {
        foreach(TKey key in randK)
        {
            if (!tempDict.ContainsKey(key))
            {
                tempDict.Add(key, dictionary[key]);
                Console.WriteLine("key = " + key.ToString());
            }
        }
    }
    dictionary = null;
    dictionary = tempDict;
    Console.WriteLine("New dict size = " + dictionary.Count());
}
desc
  • 1,190
  • 1
  • 12
  • 26
  • 1
    You're inner `if` also needs to do the check of the count against `keepN` otherwise you add all the keys from `randK`. I'd suggest using a [Fisher-Yates shuffle](https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle) to get a random order of the keys instead of hoping that taking twice as many will not result in too many duplicates. – juharr Oct 18 '18 at 23:55
  • will KeepN be much smaller than N? – Mitch Wheat Oct 18 '18 at 23:57
  • @juharr, thanks for the suggesting, I'll look into the shuffle. Also, keepN will always be much smaller than N. – desc Oct 19 '18 at 00:19

2 Answers2

1

There are a couple of things you can do to make sure your keepN value is respected regarding end size.

First, I would get rid of the while (tempDict.Count() < keepN) loop. Then, I would get rid of the Take(keepN * 2) from the randk assignment, leaving you with something that looks like this:

IEnumerable<TKey> randK = RandomKeys(dictionary); // Lazy infinite sequence

foreach (TKey key in randK)
{
    if(tempDict.Count >= keepN) // Break from the loop once we hit our target count
        break;
    if (!tempDict.ContainsKey(key))
    {
        tempDict.Add(key, dictionary[key]);
        Console.WriteLine("key = " + key.ToString());
    }
}

In this case, we're okay with randK being an infinite sequence, as we'll break out of the loop as soon as the temporary dictionary's count has reached the desired target.

This should be reasonably quick when keepN is much smaller than dictionary.Count, but with a large N and a close keepN, you're going to be randomly selecting for one of very few remaining values.

Jonathon Chase
  • 9,396
  • 21
  • 39
  • Nice, this worked for me. Your point about the size of keepN compared to dictionary.Count is a good one as well that'll I'll have to keep in mind moving forward. Thanks! – desc Oct 19 '18 at 00:14
1

The issue is after first iteration of add , you could end up with keepN-1 item added to your temp dictionary. But for next iteration, there are another of keepN * 2 new keys being checked and added, which will end up more than KeepN number add to your temp collection on the second or third iteration.

Even on the first iteration, you could endup more than keepN item added, because you checked keepN * 2 Keys.

The fix is easy.

       foreach(TKey key in randK)
        {
            if (!tempDict.ContainsKey(key))
            {
                tempDict.Add(key, dictionary[key]);
                Console.WriteLine("key = " + key.ToString());
            }
            //------------------------------------------
            // the easy fix it here
            if(tempDict.Count() == keepN) break;
            //-----------------------------------
        }

But you can do better

1. Generating keepN random, unique values!
https://stackoverflow.com/questions/14473321/generating-random-unique-values-c-sharp
2. var sourceDicKeyArray = dictionary.Keys.ToArray();
3. foreach index in keepNRandomValues
   newDic.Add(sourceDicKeyArray[index], dictionary[sourceDicKeyArray[index]] )
LeY
  • 659
  • 7
  • 21