Fast way to check for existence and then insert into a SortedList

Question

Whenever I want to insert into a SortedList, I check to see if the item exists, then I insert. Is this performing the same search twice? Once to see if the item is there and again to find where to insert the item? Is there a way to optimize this to speed it up or is this just the way to do it, no changes necessary?

if( sortedList.ContainsKey( foo ) == false ){
    sortedList.Add( foo, 0 );
}

Is there an `else` branch? You may get a better answer if you show it. — Sergey Kalinichenko, Nov 08 '12 at 16:16
`SortedList` isn't a particularly good data structure; it's rarely desirable. Why not use a `SortedDictionary` or just a `List`? Odds are one of the two would be preferable. — Servy, Nov 08 '12 at 16:17
Dasblinkenlight - I don't have an else present. If the key is already present I don't do anything. cirrus - I don't think that the ContainsKey is slow, I was wondering if the ContainsKey combined with the Add together are necessary. — mj_, Nov 08 '12 at 16:19
Actually, looking in ILSpy ContainsKey() does a BinarySearch so you'd likely be better off with a SortedDictionary — cirrus, Nov 08 '12 at 16:20
@cirrus Of course it uses a binary search. All sorted data structures will search using a binary search; that's the entire point of storing them as sorted data in the first place. — Servy, Nov 08 '12 at 16:21
@Servy while I agree a tree structure has its advantages here, the .NET `SortedDictionary<,>` is poorly implemented (or rather poorly exposed). It doesn't expose any of the useful tricks possible on a tree in O(log n) time. For example, `Min`/`Max` operations, or the nearest range operations. It's much easier to do on a `SortedList<,>` (for example implementing binary search on it is quite possible). — nawfal, Jun 12 '14 at 18:08
@nawfal Except for the fact that all mutations of a `SortedList` scale much more poorly than in a tree based structure. A tree based structure is the appropriate tool to use here, conceptually, it's just a shame that .NET doesn't provide one with enough tools exposed for some of the operations one might like to perform. Of course, whether or not the OP here is someone who actually needs to perform those operations is unclear. If you did, the best solution would likely be a 3rd party tree implementation that exposes a read only view of the actual tree based structure. — Servy, Jun 12 '14 at 18:53
@Servy I dont disagree. I had it in my mind. And `SortedDictionary` very well may be better for OP's use. I pointed it too. My comment was for `SortedList isn't a particularly good data structure;`. That sounded like it deserve a reply on where `SortedList<,>` makes sense. — nawfal, Jun 12 '14 at 18:59
@Servy I disagree. When all the user wants is the lacking feature of the appropriate data structure, its better to use structure that is not lacking. That's when it becomes appropriate, unfortunately "given the circumstances". I repeat I was only against your generalized statement that `SortedList` is a poor structure "given the circumstances". So here's a question: What if I need a lot of index based operations, binary searches, and min/max operations on a sorted map? It's not rare. Given the options in .NET, there's no better alternative to SortedList. SortedDictionary is a poor choice here. — nawfal, Jun 12 '14 at 19:17
@nawfal In such circumstances one would almost certainly be better off using a `List`, rather than a `SortedList`, and simply sorting the data as needed. The number of situations in which one need to do those types of things *while periodically mutating the collection in-between those types of operations* makes it much rarer. Adding N items to a sorted list is an O(n^2) operation, as opposed to adding N items to a list an then sorting, which is O(n*log(n)). My original comment is not that `SortedList` is *never* right, simply that it's *rarely* right. — Servy, Jun 12 '14 at 19:22
@Servy Ok, I think its time to stop the discussion. So I get your point. I should rather do some heavy lifting and make an `List` work like a `SortedList` right? And also thanks for repeating the same point of addition cost, when in my requirement it wasn't there. So here goes the verdict. In case I need binary search operations, the best structures are `List` > `SortedDictionary` > `SortedList`. Now all there is left to wonder is if `SortedList` is not even appropriate here, which are those rare cases its appropriate! — nawfal, Jun 12 '14 at 19:32
@Servy its not rare. Yesterday I opened a set of tabs on various questions related to binary operations on sorted maps. I havent still finished closing them all. Such operations are frequent on sorted collections. — nawfal, Jun 12 '14 at 19:33
@nawfal It's hard to comment without actually seeing the questions, so I'm not going to try to theorize about whether or not questions I haven't seen would be better solved using another data structure. All I can say is that out of all of my experiences in programming I've only ever come across a small handful of context in which I felt `SortedList` really was the data structure best suited to the task at hand and to which the program wouldn't have been better off using one of the other two mentioned approaches. — Servy, Jun 12 '14 at 19:37
@Servy And my original comment is not it's always right (or it's not *rarer*), simply it's far from a poor structure in the context of .NET. — nawfal, Jun 12 '14 at 19:43
@nawfal Just about every single use I've ever seen of the data structure was a context in which it appeared to be used improperly. When a type is primarily used improperly, and is very rarely actually used in a context in which it is the right tool for a job, I would say that it's a pretty poor data structure. Personally I'd rather see it in a MS library outside of the BCL, so that the people that actually need it can access it, but it's less likely to be used inappropriately. Of course, the breaking change to move it would never happen. — Servy, Jun 12 '14 at 19:47
@Servy ok thanks for the clarification on relation between frequency of use and poorness of data structure. We had a definition problem until now then :) — nawfal, Jun 12 '14 at 19:51

Peter · Accepted Answer · 2012-11-08T16:24:04.310

6

You can add the items to a HashSet and the List, searching in the hash set is the fastest way to see if you have to add the value to the list.

if( hashSet.Contains( foo ) == false ){
    sortedList.Add( foo, 0 );  
    hashSet.Add(foo);
}

edited Nov 08 '12 at 16:24

answered Nov 08 '12 at 16:13

Peter

27,590
8
64
84

Even better, with a HashSet you don't even need to check and see if it's there, you can just `Add` it again and if it already exists nothing will happen (very low performance hit). The only disadvantage is you don't get sorting. – ean5533 Nov 08 '12 at 16:17
2

Why even keep the `SortedList` around in this case? Just use either a `Dictionary` or `SortedDictionary` in the first place. – Servy Nov 08 '12 at 16:19
@ean5533: I think the idea here was to keep to parallel collections. The unsorted hash set for quick lookup of keys and a the sorted list for actual values. Of course, this requires more memory and I'd want to profile it before I'd believe it's actually faster. `sortedList.Add` still has to find the right place in the list for the insertion. – Matt Burland Nov 08 '12 at 16:28
1

@Servy: I assume the OP wanted the items sorted, for whatever reason. [MSDN](http://msdn.microsoft.com/en-us/library/f7fta44c.aspx) has some comments on the differences between `SortedList` and `SortedDictionary`. For example, `SortedList` uses less memory, but has slower insertions. Of course, the solution proposed by peer, undoubtedly blows any memory savings! – Matt Burland Nov 08 '12 at 16:32
@MattBurland Yes, I'm aware of the differences. The insertions really are quite slow for non trivial amounts of data, and the memory cost of the alternatives aren't *that* much higher while the performance benefits really are. I posted it as a comment rather than an answer simply because it's *possible* that there is good reason for using a `SortedList`, and the OP hasn't given enough context to be sure. I find it very unlikely that it's the case though. – Servy Nov 08 '12 at 16:36
The only way to know is to test if it works with the data the mj_ want to put in the list is to test it. We can't say anything about is without knowing the set (large/small and number of duplicates). – Peter Nov 08 '12 at 16:40
@Servy: Depends on how often you are doing insertions. But yes, the OP didn't give enough context to make that call and SortedDictionary is probably the better choice 9 times out of 10. – Matt Burland Nov 08 '12 at 16:41
+1. This is probably easiest approach if you need SortedList for some erason. As @Rawling suggested my previous comment(deleted) was wrong: I said that indexing (meaning "n-th item") would be O(1) for SortedList. It is wrong because such operation it is not present at all for SortedList. Somehow I thought SortedList implements IList, which it does not. So there is no easy way to say "give me third item" (short of iterating which is O(n)). – Alexei Levenkov Nov 08 '12 at 17:08
Thank you for the suggestion. The HashSet put me onto the HashTable which I ended up using. I've got great performance now. – mj_ Nov 08 '12 at 21:32

Olivier Jacot-Descombes · Answer 2 · 2012-11-08T16:52:46.010

1

You can use the indexer. The indexer does this in an optimal way internally by first looking for the index corresponding to the key using a binary search and then using this index to replace an existing item. Otherwise a new item is added by taking in account the index already calculated.

list["foo"] = value;

No exception is thrown whether the key already exists or not.

UPDATE:

If the new value is the same as the old value, replacing the old value will have the same effect than doing nothing.

Keep in mind that a binary search is done. This means that it takes about 10 steps to find an item among 1000 items! log2(1000) ~= 10. Therefore doing an extra search will not have a significant impact on speed. Searching among 1,000,000 items will only double this value (~ 20 steps).

But setting the value through the indexer will do only one search in any case. I looked at the code using Reflector and can confirm this.

edited Nov 08 '12 at 16:52

answered Nov 08 '12 at 16:22

Olivier Jacot-Descombes

104,806
13
138
188

3

This will add whether or not it already exists; he only wants to add if it doesn't already exist. – Rawling Nov 08 '12 at 16:25
I tested this and found it no faster than calling `ContainsKey()` and then adding. – Jon B Nov 08 '12 at 16:30
@Rawling: No, if the key already exists, the existing value will be replaced, which makes no difference, if the new value is the same. You will never get a duplicate entry. If, however, the value changed and you want to keep the old one, then you are right, you will have to perform a check before adding. – Olivier Jacot-Descombes Nov 08 '12 at 16:31
1

@JonB It _should_ be faster, it only does one search. Olivier: Yes it will be replaced if the key already exists. The asker specifically wants to _do nothing_ if the key already exists. – Rawling Nov 08 '12 at 16:33
1

@JonB: That might well be the case. This simply means that there is no more efficient way (in terms of speed) to do it; however, it simplifies your code a little bit. A binary search is very efficient therefore the extra search will not make a big difference unless the collection is huge. Make a test with 100,000,000 entries! – Olivier Jacot-Descombes Nov 08 '12 at 16:34
@Rawling: If the new and old values are the same, doing something and doing nothing leads exactly to the same result. – Olivier Jacot-Descombes Nov 08 '12 at 16:39
@Rawling While that's possible, it's unlikely. It's at least worth proposing. If the OP is in an edge case in which that's not an option, then that's fine. – Servy Nov 08 '12 at 16:42
@Servy, Olivier I read the question as "avoid adding a different value if the key already exists", but now I see it _could_ be read as "avoid re-adding a value if the key already exists because it throws an exception". – Rawling Nov 08 '12 at 16:46

score 1 · Answer 3 · edited May 23 '17 at 11:47

I'm sorry if this doesn't answer your question, but I have to say sometimes the default collection structures in .NET are unjustifiably limited in features. This could have been handled if Add method returned a boolean indicating success/failure very much like HashSet<T>.Add does. So everything goes in one step. In fact the whole of ICollection<T>.Add should have been a boolean so that implementation-wise it's forced, very much like Collection<T> in Java does.

You could either use a SortedDictionary<K, V> structure as pointed out by Servy or a combination of HashSet<K> and SortedList<K, V> as in peer's answer for better performance, but neither of them are really sticking to do it only once philosophy. I tried a couple of open source projects to see if there is a better implementation in this respect, but couldn't find.

Your options:

In vast majority of the cases it's ok to do two lookups, doesn't hurt much. Stick to one. There is no solution built in.
Write your own SortedList<K, V> class. It's not difficult at all.
If you'r desperate, you can use reflection. The Insert method is a private member in SortedList class. An example that does.. Kindly dont do it. It's a very very poor choice. Mentioned here for completeness.

Matt Burland · Answer 4 · 2012-11-08T16:42:45.013

0

ContainsKey does a binary search, which is O(log n), so unless you list is massive, I wouldn't worry about it too much. And, presumably, on insertion it does another binary search to find the location to insert at.

One option to avoid this (doing the search twice) is to use a the BinarySearch method of List. This will return a negative value if the item isn't found and that negative value is the bitwise compliment of the place where the item should be inserted. So you can look for an item, and if it's not already in the list, you know exactly where it should be inserted to keep the list sorted.

edited Nov 08 '12 at 16:42

answered Nov 08 '12 at 16:19

Matt Burland

44,552
18
99
171

2

It's already using a binary search behind the scenes. (Actually now I see what you're getting at, this would be neat if he wasn't already using a SortedList.) – Rawling Nov 08 '12 at 16:20
1

+1 In the OPs specific scenario he would actually be better off with a regular `List` - because `BinarySearch` can be used to perform a single search to check for existence and the insertion index, you can do an insert-if-doesn't-exist with a single search. `BinaryList` doesn't expose this functionality. – Rawling Nov 08 '12 at 16:29
"binary search, which is O(n log n)" - really? You have an extra n... Or you mean something different? – Alexei Levenkov Nov 08 '12 at 16:33
@AlexeiLevenkov: Whoops. Corrected. – Matt Burland Nov 08 '12 at 16:35
@Rawling: Yeah. Kind of odd that SortedList doesn't provide that already. – Matt Burland Nov 08 '12 at 16:36
SortedList<,> shouldn't expose any insert methods. It's odd it doesn't have a `bool Add(T)` method like a HashSet<>. In fact `Add` on `ICollection` should have had a boolean return type to be useful in scenarios like this :( – nawfal Jun 12 '14 at 18:28

score 0 · Answer 5 · edited May 23 '17 at 10:28

SortedList<Key,Value> is a slow data structure that you probably shouldn't use at all. You may have already considered using SortedDictionary<Key,Value> but found it inconvenient because the items don't have indexes (you can't write sortedDictionary[0]) and because you can write a find nearest key operation for SortedList but not SortedDictionary.

But if you're willing to switch to a third-party library, you can get better performance by changing to a different data structure.

The Loyc Core libraries include a data type that works the same way as SortedList<Key,Value> but is dramatically faster when the list is large. It's called BDictionary<Key,Value>.

Now, answering your original question: yes, the way you wrote the code, it performs two searches and one insert (the insert is the slowest part). If you switch to BDictionary, there is a method bdictionary.AddIfNotPresent(key, value) which combines those two operations into a single operation. It returns true if the specified item was added, or false if it was already present.

Fast way to check for existence and then insert into a SortedList

5 Answers5