13

I have a class Foo which contains a list of objects: List<Bar>. Each Bar has a property which they can be ordered on (of type TimeSpan, representing a duration), and Bar is an immutable object - that is, the duration does not change over the running of the algorithm. At the moment, for each Foo I also maintain the Bar that would be first in the list if it were to be ordered (i.e. the Bar of shortest duration). Something like this:

public class Foo
{
    public List<Bar> AllBars { get; set; }

    public Bar FirstBar { get; set; }

    public Foo (Bar bar)
    {
        FirstBar = bar;

        AllBars = new List<Bar>() { bar };
    }

    public AddBar(Bar bar)
    {
        if(bar.Duration < FirstBar.Duration)
        {
            FirstBar = bar;
        }

        AllBars.Add(bar);
    }
}

This class Foo is used in an algorithm where processing performance (speed) is critical. Memory is important but not as much as speed. There is a list of n Foos, each of which has up to m Bars. This class has served me well up until this point. I now wish to offer the user several choices, meaning I will need to provide random access to the first few Bars in the list.

I would thus like to store my Bars in order so that I can access them by index in order. In my Bar class I implemented IComparable to allow Bars to be compared on duration but I am stuck at choosing an appropriate data type. I looked at System.Collections.SortedList but (unless I am wrong) this appears to reference elements by key as it implements IDictionary. What collection could I use that would maintain my objects such that they stay sorted, and such that they are traversable in order of index?

08Dc91wk
  • 4,254
  • 8
  • 34
  • 67
  • Can't you just sort a regular `List` using its `Sort` method? This will need calling after every insert, but allows you to also suppress sorting if you know you're about to add a batch of items. You could decorate `List` with your own implementation that does the sorting call for you so from the outside you're still using an `IList`. – Adam Houldsworth Jul 23 '15 at 13:40
  • 3
    Try [`SortedSet`](https://msdn.microsoft.com/en-us/library/dd412070.aspx) but note that it does not allow duplicates. – D Stanley Jul 23 '15 at 13:41
  • 1
    @AdamHouldsworth dont think it goes along with 'performance' – Alex K. Jul 23 '15 at 13:42
  • @AlexanderKozlov So far as I can tell from the provided code, that was the current solution. – Adam Houldsworth Jul 23 '15 at 13:43
  • I would go with `SortedList`. `SortedSet` doesnt allow you to get element by index – Alex K. Jul 23 '15 at 13:43
  • 1
    Why not use `SortedList` and use `Duration` as the key rather then implementing a comparator? – D Stanley Jul 23 '15 at 13:44
  • @DStanley That would assume knowing the durations by which to index up front, whereas it seems like a regular array index is desired, just in duration order. Seems neither `SortedList` nor `SortedSet` achieve that. I don't yet see why sorting a regular list isn't working, or where the performance degradation comes in. – Adam Houldsworth Jul 23 '15 at 13:45
  • 1
    Do you need them to be accessed _randomly_ by index or just _in order_? In other words - would you need to get `AllBars[2]` or do you just need to maintain order when iterating with `foreach`? – D Stanley Jul 23 '15 at 13:46
  • Using a `SortedList` would perform extremely poorly. It would make every addition to the list quite expensive. – Servy Jul 23 '15 at 13:52
  • @AdamHouldsworth Plain `List` wasnt intended to store elements in a sorted order. So each time you call `.Sort()` it sorts whole array. But `Sorted*` classes maintain elements in a sorted way resulting in faster insertions – Alex K. Jul 23 '15 at 13:53
  • 3
    I would personally just order it at the point of use in the algorithm. Maintain a flag to state if it needs ordering based on whether it had additions since the last sort, and then make it the caller's responsibility to sort prior to running the algorithm. At the end of the day, the implementation of the sort is specific to this algorithm so I see no issue moving responsibility across to it. – Adam Houldsworth Jul 23 '15 at 13:54
  • 2
    @AlexanderKozlov I know, but I want to understand why something as simple as `var sorted = list.OrderBy(_ => _).ToArray()` can't be used prior to this algorithm. We can't suggest an implementation without knowing how many insertions occur, how big the list gets, how often this algorithm is called between sorts or w/e. – Adam Houldsworth Jul 23 '15 at 13:56
  • 1
    Can `Duration` change over the life of an object? If so then re-sorting periodically is the only option (either explicitly or implicitly by the collection). – D Stanley Jul 23 '15 at 13:57
  • @AdamHouldsworth Got your point. Of course if we need to sort all elements once prior to an algorithm, List would be just fine – Alex K. Jul 23 '15 at 13:59
  • 1
    If you want the best X items and you never delete Items, you could maintain a separate sorted collection BestTimes of (TimeSpan, Item). When you insert an item, update the simple List of Item, and then update BestTimes as follows: check if the toBeInserted Item's TimeSpan is better than the worst (TimeSpan, Item) and if so, replace the worst and bubble the the new (TimeSpan, Item) up to its appropriate place. – Jerry Federspiel Jul 23 '15 at 14:03
  • @AdamHouldsworth I *could* but that would devastate my currently O(n) algorithm, which only ever inserts one `Bar` at a time. The provided code only compares the inserted item once, to see if FirstBar needs updating. Re: ordering it at point of use - I should have pointed out there are *m* objects of type `Foo`. – 08Dc91wk Jul 23 '15 at 14:04
  • @DStanley I need to access randomly by index. Each `Foo` (of about 100 000) ends up having 50 to 100 `Bar`s. I will use the first 10 or so that need to be accessed randomly (they are options for the user). And no, **`Bar`s are immutable - duration never changes.** I will update the question with this information. – 08Dc91wk Jul 23 '15 at 14:04
  • 2
    This is one of those things where you have to decided what the best compromise is. If you want a sorted list, with indexed access and the ability to insert (and maybe remove) items while maintaining the sorted order, then at least one of those operations is going to end up being slow in order to have the others be fast. You need to figure out what the usage is going to look like. Are you going to be inserting a lot? Reading a lot? Reading at random or in order? How big is the list? Etc. – Matt Burland Jul 23 '15 at 14:05
  • @Ivan Have you ever encountered the [Wintellect Power Collections](https://powercollections.codeplex.com/)? – Adam Houldsworth Jul 23 '15 at 14:27
  • Is Durantion (timspan) value unique in the collection? – paparazzo Jul 23 '15 at 14:52
  • @Blam No, it is not, however if durations are equal it doesn't matter which is first. – 08Dc91wk Jul 23 '15 at 14:55
  • @AdamHouldsworth No, I noticed you mentioned it in your answer, thanks I will take a look! – 08Dc91wk Jul 23 '15 at 14:56
  • If you can live with having "values" that mean nothing, just use a `SortedList` where you do not use the value part. Add with `yourSortedList.Add(yourBar, null)` in O(n) time (the list will have to move "up" all entries after the point where you insert). Retrieve the `i`th entry in O(1) time with `yourSortedList.Keys[i]`. ***EDIT:*** You will not be able to have duplicates in your `SortedList<,>`, so two members can not `CompareTo` each other with return value zero. – Jeppe Stig Nielsen Jul 23 '15 at 14:58
  • What is "traversable in order of index". Is that index the Duration order or the order of the insert? – paparazzo Jul 23 '15 at 15:26
  • Do `Bar`s get removed? – smartcaveman Jul 23 '15 at 16:16
  • What's your ratio of write operations that change the collection to read operations that just examine it? If that ratio is low then just doing a `BinarySearch` on inserts (and if adding a large number in one go, a plain `AddRange` followed by a `Sort`) would be the way to go as while it's expensive, the read-only accesses are as fast as with a list. – Jon Hanna Jul 23 '15 at 16:41
  • @JeppeStigNielsen That sounds like a good approach and should be an answer, not a comment. There are possibly some questions and comments I'd like to make on this approach without getting lost in the above extended discussion. – 08Dc91wk Jul 24 '15 at 07:00
  • @Blam Sorry, I should be more clear. I want the `Bar` items in order of duration and accessed in order of duration. Insertion order is unimportant. – 08Dc91wk Jul 24 '15 at 07:01
  • @smartcaveman No, `Bar`s never get removed, and they are also immutable. – 08Dc91wk Jul 24 '15 at 07:02
  • Then my answer works – paparazzo Jul 24 '15 at 08:32
  • @Blam I think the problem with your answer is it didn't maintain the insertion order it's my understanding the OP wants insertion order AND sort order. Your answer ensures they are the same thing, but in doing so, losses the original insertion order – smartcaveman Jul 24 '15 at 08:41
  • @smartcaveman OP clearly states differently? – paparazzo Jul 24 '15 at 08:45
  • @Blam Oh, I read "by index in order" as "by index and by order" the first time. Guess it's time to delete my answer :) . I agree yours works – smartcaveman Jul 24 '15 at 08:56

3 Answers3

5

I prefer to use SortedSet<T>, which is a binary tree where the key and value are the same object. This once again means that adding/removing/lookups are logarithmic - O(log n) - but you gain the ability to iterate over the items in order. For this collection to be effective, type T must implement IComparable<T> or you need to supply an external IComparer<T>.

S.N
  • 4,910
  • 5
  • 31
  • 51
  • How do you plan on getting the nth item out of a SortedSet? – Rawling Jul 23 '15 at 14:20
  • You can iterate over in its order. This is probably what the requester need. Now to make it more harder, nth element can't be picked up due to its backing store. However, you can use ( i haven't done any performance testing on this) Enumerable.ElementAt(). – S.N Jul 23 '15 at 14:28
  • I cannot see how `GetHashCode()` is relevant for a `SortedSet<>`. It uses only the `int` values returned by `CompareTo` (or more precisely the `Compare` method of the comparer). – Jeppe Stig Nielsen Jul 23 '15 at 14:29
  • @JeppeStigNielsen, The backing store for SortedSet SortedDictionary is a binary tree where the key and value are the same object. Correct me if I am wrong. – S.N Jul 23 '15 at 14:34
  • The binary tree there will *not* use `GetHashCode()`? – Jeppe Stig Nielsen Jul 23 '15 at 14:35
  • @Nair So thus retrieval complexity is O(n)? – Alex K. Jul 23 '15 at 14:36
  • @AlexanderKozlov, If you want to look at nth element then it is O(n). Otherwise it is O(lon n), due to unchanged binary tree. Did I answer your question? – S.N Jul 23 '15 at 14:39
  • @AlexanderKozlov `SortedDictionary` retrieval is O(log n), http://stackoverflow.com/questions/935621/whats-the-difference-between-sortedlist-and-sorteddictionary – smartcaveman Jul 23 '15 at 16:38
3

(promoted from a comment, as requested by the asker)

If you can live with having "values" that mean nothing, just use a SortedList<Bar, object> where you do not use the value part.

Add with yourSortedList.Add(yourBar, null) in O(n) time (the list will have to move "up" all entries after the point where you insert). Retrieve the ith entry in O(1) time with yourSortedList.Keys[i].

See the SortedList<,>.Keys property documentation for some "proof" that the above description is correct. Note that a SortedList<,> actually consists of a "list" (i.e. an array of length Capacity, subject to substitution by a larger array when necessary). This is different from SortedDictionary<,> which I believe is a binary search tree.

Note however: You will not be able to have duplicates in your SortedList<,>, so two members in the list are not allowed to CompareTo each other with return value zero.

Jeppe Stig Nielsen
  • 60,409
  • 11
  • 110
  • 181
-1

Why not just use List.Insert ?
Insert is O(n) and lets you insert at a specific index.
n + n is still O(n)

public AddBar(Bar bar)
{
    int index = 0;    
    foreach (bar b in AllBar)
    {
       if(bar.Duration < b.Duration)
         break;
       index++;
    }
    AllBars.Insert(index, bar);
}

So you have sort and O(1) index
At a cost of O(n) Add
The current Add in also O(n)

A SortedList in NlogN and then you don't have index as the key is Duration and the key in not unique

A SortedSet insert is LogN but a ToList is O(n) so you are still O(n)

Calling the the Sort method on the list is NlogN

This answers the stated question of: What collection could I use that would maintain my objects such that they stay sorted, and such that they are traversable in order of index?

I don't think you are going to do that with better than O(n) Add.

miken32
  • 42,008
  • 16
  • 111
  • 154
paparazzo
  • 44,497
  • 23
  • 105
  • 176