8

Currently I'm using a binary search over a SortedList<T,U> for a specific number, and if it doesn't exist I get the closest lower-bound key-item instead.

I saw that it was rather slow in inserting unsorted data which I'm doing a lot of.

Is there a way to do something similar with the SortedDictionary, or should I just stick to my SortedList?

Community
  • 1
  • 1
  • 1
    This may help: http://stackoverflow.com/questions/1690929/what-net-dictionary-supports-a-find-nearest-key-operation – mad moe Jul 27 '12 at 16:25

1 Answers1

15

SortedList<K, V> is really slow when inserting data as it shifts <=N elements in internal array each time a new element is added. The complexity of addition is O(N). Nevertheless it supports binary search which allows to find exact element or its neighbors in O(log N).

Balanced binary tree is the best data structure to solve your problem. You'll be able to to do the following operations w/ logarithmic complexity:

  1. Add item in O(log N) vs. O(N) in SortedList<K, V>
  2. Remove item in O(log N)
  3. Search item or its nearest in O(log N)

Looking for element or its nearest lower-bound in binary tree is simple:

  1. Go vertically through the tree from root to child in order to find your key. If key < node, then go to left child, otherwise to the right one.
  2. If you found the key, return
  3. If key not found, nearest left parent will be the one you are looking for (nearest lower-bound)
  4. If there is no left parents, just take the last visited node, it is minimal node in the tree.

There are many articles describing how to implement binary tree. Nevertheless I'm going to reuse .NET Framework collection using a kind of hack :)

Now, I'm gonna present to you SortedSet<T> which itself is red-black tree. It has one drawback, it has no ability to find nearest nodes quickly. But we know the algorithm of search in tree (it's described in 1.) and it is implemented in SortedSet<T>.Contains method (decompiled at the bottom*). Now we can capture all nodes from root to the last visited node during traversal using our custom comparer. After that we can find nearest lower-bound node using algorithm above:

public class LowerBoundSortedSet<T> : SortedSet<T> {

    private ComparerDecorator<T> _comparerDecorator;

    private class ComparerDecorator<T> : IComparer<T> {

        private IComparer<T> _comparer;

        public T LowerBound { get; private set; }

        private bool _reset = true;

        public void Reset()
        {
            _reset = true;
        }

        public ComparerDecorator(IComparer<T> comparer)
        {
            _comparer = comparer;
        }

        public int Compare(T x, T y)
        {
            int num = _comparer.Compare(x, y);
            if (_reset)
            {
                LowerBound = y;
            }
            if (num >= 0)
            {
                LowerBound = y;
                _reset = false;
            }
            return num;
        }
    }

    public LowerBoundSortedSet()
        : this(Comparer<T>.Default) {}

    public LowerBoundSortedSet(IComparer<T> comparer)
        : base(new ComparerDecorator<T>(comparer)) {
        _comparerDecorator = (ComparerDecorator<T>)this.Comparer;
    }

    public T FindLowerBound(T key)
    {
        _comparerDecorator.Reset();
        this.Contains<T>(key);
        return _comparerDecorator.LowerBound;
    }
}

You see that finding nearest node takes no more than usual search, i.e. O(log N). So, this is the fastest solution for your problem. This collection is as fast as SortedList<K, V> in finding nearest and is as fast as SortedSet<T> in addition.

What about SortedDictionary<K, V>? It is almost the same as SortedSet<T> except one thing: each key has a value. I hope you will be able to do the same with SortedDictionary<K, V>.

*Decompiled SortedSet<T>.Contains method:

public virtual bool Contains(T item)
{
  return this.FindNode(item) != null;
}

internal virtual SortedSet<T>.Node FindNode(T item)
{
  for (SortedSet<T>.Node node = this.root; node != null; {
    int num;
    node = num < 0 ? node.Left : node.Right;
  }
  )
  {
    num = this.comparer.Compare(item, node.Item);
    if (num == 0)
      return node;
  }
  return (SortedSet<T>.Node) null;
}
nawfal
  • 70,104
  • 56
  • 326
  • 368
Raman Zhylich
  • 3,537
  • 25
  • 23
  • do you have a link for the complexity of an insert on a sorted list? – Joe Jul 27 '12 at 21:54
  • SortedDictionary has faster insertion and removal operations for unsorted data, O(log n) as opposed to O(n) for SortedList. http://msdn.microsoft.com/en-us/library/ms132319.aspx – Raman Zhylich Jul 27 '12 at 21:57
  • BTW you can decompile it and ensure that it just shifts elements during addition. So, it is smth like bubble sort. – Raman Zhylich Jul 27 '12 at 21:58
  • 1
    I wasn't aware of SortedSet, learn something new every day. This also works perfect(with an amazing speed boost in unsorted insertions compared to sortedlist), Thanks. – Rusty Shackleford Jul 28 '12 at 08:02
  • 2
    Brilliant! And I thought I would have to roll my own. Very clever solution. – Paul Chernoch Oct 30 '13 at 13:26
  • 2
    Very clever. Too bad it's implementation dependent and Microsoft can break this any moment. – SnakE Dec 03 '15 at 05:09