2

How can I randomly remove a key with value 0 efficiently ?

Dictionary<string, int> dict = new Dictionary<Edge, int>();
dict.add("a",0);
dict.add("b",0);
dict.add("c",0);
dict.add("d",1);

The size of dictionary is 10000.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
xili36
  • 17
  • 5
  • Roughly what percentage of the values have `0` in them? Like 10%, or 80%, or what? – mellamokb Dec 07 '12 at 23:59
  • Can you try using a `BiDictionary` instead, which at the cost of some performance for inserting, provides easy lookup by value as well as key because it maintains lookup tables in both directions: http://stackoverflow.com/questions/255341/getting-key-of-value-of-a-generic-dictionary. Then it would be trivial - get all the keys for value `0`, and pick a random entry. – mellamokb Dec 08 '12 at 00:01
  • Depends on how many other values you're likely to have too. – Lee Taylor Dec 08 '12 at 00:03
  • Working on Hot hitter Misra Gries algorithms on twitter data stream, these values are the counters of two twitter users tweeting each other. I approximate the percentage of 0 is more than 60% could be worse. – xili36 Dec 08 '12 at 00:08
  • What range of values can you expect to have? Why not make the key `int` and make the value `List`? – Shmiddty Dec 08 '12 at 00:30
  • @Schmiddty The key is an Object containing two users ID, and an override GetHashCode(). The value is a counter, to count how many same two users in the data streams. – xili36 Dec 08 '12 at 10:24

2 Answers2

1

Something like this should do it:

IEnumerable<string, int> pairsToRemove = dictionary.Where(pair => pair.Value == 0);

To generate a random index, you could use:

int indexToRemove = [RandomNumber] % pairsToRemove.Length() -1;

Find the indexToRemove th element from pairsToRemove and remove it from the dictionary.

As to efficiency: The complexity should be O(n)[get all items with value 0] + O(.6N)[finding ith value to remove] + O(log(n))[deletion] assuming the random number generation is constant time.

The problem is, there is no way to perform a value lookup on a dictionary in better than O(n) time. So that will be your bottleneck.

Jonathan Henson
  • 8,076
  • 3
  • 28
  • 52
  • 2
    From the OP: "How can I *randomly* remove **a key** with value 0 efficently?" – mellamokb Dec 08 '12 at 00:04
  • @mellamokb Sorry, I updated it. The problem is, the lookup will always be at best big-Oh(N) on a dictionary, then finding the random value in the enumerable will be bit-Oh(.6N) given the OP's worst case senario. So, I think that bit-Oh(N) may be the best you can do. – Jonathan Henson Dec 08 '12 at 00:15
0

This will remove the first item with a zero value. It's not precisely "random", but is non-deterministic.

Dictionary<string, int> dict = new Dictionary<string, int>();
string keyToRemove = null;
foreach (var kvp in dict)
{
    if (kvp.Value == 0)
    {
        keyToRemove = kvp.Key;
        break;
    }
}
if (keyToRemove != null)
{
    dict.Remove(keyToRemove);
}
Brian Reischl
  • 7,216
  • 2
  • 35
  • 46
  • if you have two entries with same string, and you want to remove the second key but your logic will remove the 1st key. – nipiv Dec 08 '12 at 00:21
  • There is no two entries with same string, Can you explain what you mean "not precisely random", how good these random is ? – xili36 Dec 08 '12 at 00:25
  • @nipiv - It's a Dictionary, it can't have the same Key in it twice. – Brian Reischl Dec 08 '12 at 00:26
  • @XiuqingLi - According to the documentation "The order of the values in the Dictionary.ValueCollection is unspecified". Since they're not in any particular order, and we're just taking the first one, we won't get any particular key. But it's not truly random in the same sense as you would get from a random number generator. The difference may not matter, depending on your situation. – Brian Reischl Dec 08 '12 at 00:29
  • @breischl: Unspecified may mean that they come back in key order, since it is entirely implementation-dependent. Any behavior that is unspecified should not be relied on in production code, IMHO. – mellamokb Dec 08 '12 at 00:35
  • 1
    @mellamokb - It's possible that the OP's code doesn't actually rely on the ordering. The OP may have meant "random" in the sense of "it doesn't matter which one I get", in which case an unspecified behavior would be acceptable. If a stronger guarantee is required, then a more expensive approach would be required. That's why I called out the difference in the first sentence of my answer. – Brian Reischl Dec 08 '12 at 14:18