1

I have a Dictionary<..., Statistics>, where Statistics is a custom structure with some fields. This struct implements the IComparer interface.

public struct Statistics : IComparer<Statistics>
{
    public UInt64 NumberOfOccurrences;
    public UInt64 TotalNumberOfWords;
    ...

    public int Compare(Statistics x, Statistics y)
    {
        // code for compare the statistics
        ...
    }
}

I chose the dictionary to have better performance during updates of existing statistics. However, I also want good performance in reading statistics in order from best to worst, so I should use a SortedDictionary in place of Dictionary.

Now I would like to limit the number of entries within the dictionary to a specified value, by removing the entries with worst statistics, in order to limit the memory usage. I have seen that SortedDictionary does not have constructors with a capacity argument, so I thought I'd store the maximum number of entries in the integer variable elementsCountLimit. Before adding a new entry in the SortedDictionary, I could perform the following check:

if (dict.Count == elementsCountLimit)
{
    // Remove the last entry, that is the
    // entry with worst statistics.
}
  1. But, If I don't know the key, how can I remove the last entry?
  2. How can I limit the memory usage?
enzom83
  • 8,080
  • 10
  • 68
  • 114
  • Keep in mind that `capacity` on `Dictionary` is not a limit, it's an initialization of size that increases when it is exceeded. – Marc Feb 04 '12 at 23:42
  • 1
    A `SortedDictionary` is sorted by the keys not the values. However from your description it sounds like you want to sort by the `Statistics` which are you values - how does that work? – ChrisWue Feb 04 '12 at 23:42
  • @ChrisWue Right now I'm using a `Dictionary`. I thought that the `SortedDictionary` could also order by the values (`Statistics`), so I implemented the `IComparer` interface for `Statistics`. – enzom83 Feb 04 '12 at 23:52
  • @ChrisWue, `how does that work?` It works that way http://msdn.microsoft.com/en-us/library/a045f865(v=vs.85).aspx – L.B Feb 05 '12 at 01:01
  • @L.B: That's what I said: It's sorted by **keys** not the values. He wants to sort by values (the `Statistics`) so a sorted dictionary seems inapropriate. – ChrisWue Feb 05 '12 at 08:18
  • "Represents a collection of key/value pairs that are sorted on the key." http://msdn.microsoft.com/en-us/library/f7fta44c(v=vs.85).aspx – Sam Greenhalgh Feb 06 '12 at 00:26
  • 1
    Looks a bit like this question might help http://stackoverflow.com/questions/289/how-do-you-sort-a-c-sharp-dictionary-by-value ? – Sam Greenhalgh Feb 06 '12 at 00:29

4 Answers4

2

if you don't mind iterating over the keys dict.Remove(dict.Keys.Last());

L.B
  • 114,136
  • 19
  • 178
  • 224
  • It could be a solution, but if there are many keys (thousands) in the dictionary...? – enzom83 Feb 04 '12 at 23:43
  • I know, therefore I said `if you don't mind`. In addition, your question is `how can I remove the last entry?` not about `any data structure alternative` – L.B Feb 04 '12 at 23:48
1

You could use binary search to handle this. Using something like:

// example variables for clarity of them in the method
private List<Statistics> _statistics;
private int _capacity;

public void AddStatistic(Statistics statistic)
{
    var index = _statistics.BinarySearch(statistic);

    // if it's not in the list, index will the negative representation
    // of where it should be if sorted.
    if(index < 0)
    {
       index = ~index;
    }

   // insert it into the correct location
    _statistics.Insert(index, statistic);

    if(_statistics.Length > capacity)
    {
        // assuming sorted best to worst, remove the last item
        _statistics.RemoveAt(_statistics.Count - 1);
    }
}
Marc
  • 9,254
  • 2
  • 29
  • 31
  • So I should use a `List` in place of a `Dictionary`? – enzom83 Feb 05 '12 at 00:12
  • 1
    I would, you sound like you're only interested in a sorted list of `Statistics`. That's not really `Dictionary`'s purpose. – Marc Feb 05 '12 at 00:18
  • 1
    @Marc, very nice solution. But you also have to make a BinarySearch with every `get` given that you don't have a key but only `statistic` – L.B Feb 05 '12 at 00:20
  • @L.B you're assuming there is a `get` operation. I'm not so sure given the original question but I could be wrong. – Marc Feb 05 '12 at 00:21
  • @Marc, from question `I chose the dictionary to have better performance during updates of existing statistics` – L.B Feb 05 '12 at 00:22
  • In my idea of the dictionary, each key was an object associated with a particular statistic. But given that the dictionary can not order entry (key value pair) based on the value, then perhaps I should opt for a `List` (of `KeyValuePair`), which also allows me to satisfy the constraint of memory consumption: I lose the performance of a `Dictionary`, but I have to satisfy the constraint on the memory consumption. – enzom83 Feb 05 '12 at 00:31
  • 1
    @enzom83, of course you can use this list too, but don't forget that the constructor of a SortedDictionary can take a IComparer where you can make a custom sort according to statistics.(you can also store the `min` value somewhere while comparing) – L.B Feb 05 '12 at 00:36
  • @Marc Should that be minus sign in "index = ~index"? Atleast I don't have any clue what that tilde is supposed to do :| I thought tilde is used only in finalizers (~NAME) in C#. –  Feb 05 '12 at 01:15
  • 1
    @JaakkoLipsanen No. `~-3 -> 2` but `-(-3) -> 3`. From MS `The zero-based index of item in the sorted List(Of T), if item is found; otherwise, a negative number that is the bitwise complement of the index of the next element that is larger than item or, if there is no larger element, the bitwise complement of Count.` – L.B Feb 05 '12 at 01:19
  • @L.B Really intresting! It is great to discover new features in C# :) –  Feb 05 '12 at 01:21
  • Well, using this solution, I have to make a binary search with every get, so given N statistics, at most `log_2 N` questions are required: it seems acceptable, given the constraint of used memory; after all, with 1000 statistics I perform a maximum of 10 comparisons, with 10K statistics I will need to make a maximum of 14 comparisons, ect.. But 10K statistics are still too many in the domain of my application. – enzom83 Feb 05 '12 at 22:02
1

Write a simple array implementation for Heap<T>. Then, use a Heap<TKey>, and a Dictionary<TKey, Statistics> together. Heap will serve as a priority queue. Careful: The top element of the heap should be the item to be deleted next. i.e. If you want to keep the "maximum" items, the heap should keep the minimum item at the root. When you will insert an item and the heap is full, also delete the root in Log(n) time. Whenever you delete an item from the heap delete it from the dictionary as well. Pseudocode:

Insert (TKey item, Heap<TKey> heap, Dictionary<TKey, TStatistics> dictionary)
    insert (item, heap)
    if heap capacity is exceeded:
        var deleteditem = deletemin(heap)
        if(deleteditem!=item):
            insert(item, dictionary)
            delete(deleteditem,dictionary)
    else:
        insert(item,dictionary)
Ali Ferhat
  • 2,511
  • 17
  • 24
  • I implemented an `Heap` class. Then I used `Heap` and a custom `IComparer>` in order to sort elements by `Statistics`. – enzom83 Feb 11 '12 at 23:00
1

A SortedDictionary is sorted by its keys and not the values. So if you want to remove entries based on some properties of the values you need to iterate over all entries and determine which one to remove. This makes it an O(n) operation which negates the performance benefits of the dictionary.

I would use a custom data structure to keep the data in a way you want and encapsulate any functionality of a maximum number of entries:

public class StatisticsMap<TKey>
{
    public StatisticsMap<TKey>(int maxEntries)
    {
         _MaxEntries = maxEntries;
    }

    private int _MaxEntries;
    private SortedList<Tuple<Statistics, TKey>> _SortedStatistics;
    private Dictionary<TKey, Statistics> _KeyedStatistics;

    public void Add(TKey key, Statistics stat)
    {
        _SortedStatistics.Add(Tuple.Create(stat, key));
        _KeyedStatistics[key] = stat;
        while (_SortedStats.Count > _MaxEntries)
        {
             int lastIndex = _SortedStatistics.Count - 1;
             var doomed = _SortedStatistics[lastIndex];
             _SortedStatistics.RemoveAt(lastIndex);
             _KeyedStatistics.Remove(doomed.Item2);
        }
    }

    public Statistics Get(TKey key)
    {
        return _KeyedStatistics[key];
    }
}

Depending on how you intend to use it you could implement the IDictionary interface for greater flexibility.

TradeOff: Uses double the memory.

Edit

You only need a dictionary if you need O(1) access to the Statistics objects by a given key. If you can get away with an index or if iterating over the list is good enough then you don't need a dictionary. A SortedList and a SortedDictionary do not have any built in capacity limits - none of the Framework data structures do (except an array which has a fixed size). You need to add that part yourself no matter what you use.

The code I suggested assumes that you need O(1) access to a Statistics object by a given key and that you need to remove Statistics object based on certain properties effectively.

ChrisWue
  • 18,612
  • 4
  • 58
  • 83
  • I read the documentation about `SortedList` and I noticed that is very similar to `SortedDictionary`. I would say that they are nearly equal: what are the differences between `SortedDictionary` and `SortedList`? Using the `SortedDictionary`, I can not have guarantees on the maximum memory requirements... Instead, is it possible to know the maximum amount of memory required to store N elements in a `SortedList`? – enzom83 Feb 05 '12 at 22:27
  • @enzom83: You only need a dictionary if you need `O(1)` access to the `Statistics` objects by a given key. If you can get away with an index or if iterating over the list is good enough then you don't need a dictionary. – ChrisWue Feb 05 '12 at 23:57
  • Summarizing, if the amount of used memory is the main constraint (for example in Windows Phone), then I should use a simple `List`, giving up the performance. Otherwise, if the amount of used memory is not a problem, then it is better to opt for the `Dictionary`. – enzom83 Feb 06 '12 at 00:27