4

UPDATE: Starting with .Net 4.7.2, HashSet.TryGetValue - docs is available.
HashSet.TryGetValue - SO post


I have a problem with HashSet because it does not provide any method similar to TryGetValue known from Dictionary. And I need such method -- passing element to find in the set, and set returning element from its collection (when found).

Sidenote -- "why do you need element from the set, you already have that element?". No, I don't, equality and identity are two different things.

HashSet is not sealed but all its fields are private, so deriving from it is pointless. I cannot use Dictionary instead because I need SetEquals method. I was thinking about grabbing a source for HashSet and adding desired method, but the license is not truly open source (I can look, but I cannot distribute/modify). I could use reflection but the arrays in HashSet are not readonly meaning I cannot bind to those fields once per instance lifetime.

And I don't want to use full blown library for just single class.

So far I am stuck with LINQ SingleOrDefault. So the question is how fix this -- have HashSet with TryGetValue?

ToolmakerSteve
  • 18,547
  • 14
  • 94
  • 196
greenoldman
  • 16,895
  • 26
  • 119
  • 185
  • It might help to provide a code example of what you have in the hash set and how you want `TryGetValue` to work! – Trevor Pilley Aug 07 '15 at 07:48
  • 3
    TryGetValue would both return true/false whether it found the item in the set *and* the item that was present in the set if it found it. – Lasse V. Karlsen Aug 07 '15 at 07:53
  • Define your own `ISet` facade around a `HashSet` private member. – Jodrell Aug 07 '15 at 08:13
  • Why do you want to use a hash set for keys where identity matters? Are you coding your own interning mechanism or something like that? Are you using `SetEquals` to compare to another set, or to another collection? – Luaan Aug 07 '15 at 08:49
  • @Luaan, identity does *not* matter per se, I am pointing out, that equality is not identity. In other words the facts I found element equal to the one in hand, does not mean those elements are identical. I need `SetEquals` to check if the two sets are equal. – greenoldman Aug 07 '15 at 08:58
  • @greenoldman I understand that. I'm asking about the part where identity *does* matter to you - why do you need to read the value from the hash set rather than using the one you already have? – Luaan Aug 07 '15 at 10:55
  • @Luaan, because each element not only holds data that are used to check equality, but also data which is not used for that and which are used for other purposes, so I cannot lose them, or swap them just because two items are equal. – greenoldman Aug 07 '15 at 11:57

5 Answers5

4

Probably you should switch from a HashSet to a SortedSet

There is a simple TryGetValue() for a SortedSet:

public bool TryGetValue(ref T element)
{
    var foundSet = sortedSet.GetViewBetween(element, element);
    if(foundSet.Count == 1)
    {
        element = foundSet.First();
        return true;
    }
    return false;       
}

when called, the element needs just all properties set which are used in the Comparer. It returns the element found in the Set.

DrKoch
  • 9,556
  • 2
  • 34
  • 43
3

I agree this is something which is basically missing. While it's only useful in rare cases, I think they're significant rare cases - most notable, key canonicalization.

I can only think of one suggestion at the moment, and it's truly foul.

You can specify your own IEqualityComparer<T> when creating a HashSet<T> - so create one which remembers the arguments to the last positive (i.e. true-returning) Equals comparison it has performed. You can then call Contains, and see what the equality comparer was asked to compare.

Caveats:

  • This holds on to references unnecessarily, so could end up preventing objects being garbage collected
  • You'd potentially want to do this on a per-thread basis (if you've got a set that isn't modified after initialization, but is then read by multiple threads, for example)
  • It assumes that HashSet<T> doesn't use any optimization such as "if the references are equal, don't bother consulting the equality comparer"
  • It's fundamentally a horrible abuse

I've been trying to think of other alternatives in terms of finding intersections, but I haven't got anywhere yet...

As noted in comments, it would be worth encapsulating this as far as possible - I suspect you only need a very limited set of operations, so I'd wrap a HashSet<T> in your own class and only expose the operations you really need - that way you get to clear the "cache" after each operation, removing my first objection above.

It still feels like a horrible abuse to me, but...

As others have suggested, an alternative would be to use a Dictionary<TKey, TValue> and implement SetEquals yourself. That would be simple enough to do - and again, you'd want to encapsulate this in your own type. Either way, you should probably design the type itself first, and then implement it using either a HashSet<> or a Dictionary<,> as an implementation detail.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Thank you!!! Luckily I don't have the need to exposing this to the others, and I have single thread application (so far). The third point is easily solved -- if there is a match, and the comparer was not triggered, it means the element is found which was passed to the method. I will keep my question open for a while, I hope you don't mind. – greenoldman Aug 07 '15 at 07:52
  • 1
    If you go this route I would consider encapsulating this in a new hashset-type class so that you expose the methods that makes sense. This would make it easier in the future to discover that your current assumptions impose a limitation as well, like not being thread-safe. You could also mitigate the "hold on to references unncessary" problem by just always asking the comparer to clear its cache after use. In other words, don't spread the ask-hashset-check-comparer code all over the place, encapsulate it out of sight. – Lasse V. Karlsen Aug 07 '15 at 08:02
  • @LasseV.Karlsen: That's a very good point, yes. I'm still not sure I'd go down the route of really proposing the use of this anyway, but at least encapsulating it is a good start. Will edit my answer. – Jon Skeet Aug 07 '15 at 08:08
  • @greenoldman: Certainly keep the question open - and see my edit and Lasse's comment for improvements. It's still a really ugly hack :( – Jon Skeet Aug 07 '15 at 08:10
  • @WouterHuysentruit: Well, the OP specifically said "I cannot use Dictionary instead because I need SetEquals method." but they could reimplement that themselves, of course. – Jon Skeet Aug 07 '15 at 08:25
  • If memory and CPU performance is critical, this is a decent solution. I'd stay away from it otherwise. Although to be fair, the `HashSet` isn't thread-safe on its own either, and if you encapsulate it in your own type to hide the abuse, it *is* safe-ish. Just don't forget to also fix all those methods that have different behaviour for an argument that is a `HashSet`, or you're going to lose performance anyway. Combining inheritance and composition is always such a pain :/ – Luaan Aug 07 '15 at 08:32
  • As the old saying goes "linearity killed the cat" :-), so horrible, not horrible, I implemented this idea and managed to shave off 1:20 min from 7:30 (not bad if you ask me -- of course I knew in advance linear searching through the set was the bottleneck). So thank you once again! – greenoldman Aug 07 '15 at 13:04
3

Sounds like you trying to use the wrong tool. True, you can save some memory using a HashSet but it seems to me that you are trying to acheeve a different goal: Get the actual element that is just equal to a representation. So in reality they are two different elements. Just the memento (a unique representation) is equal.

Therefore you'd be better of using a Dictionary where you add your elements as Key and Value. So you're able to get it back (the identical) but you miss your SetEquals....

I suppose SetEquals in it's implementation does nothing much different than sequencially compare two HashSets in it's bucket order and fails on first non-equality.

So you should be equally good off using a simple SequenceEqual() (LINQ) comparing the two Keys collections.

So this extension method could do

public static SetEqual<T,G>(this IDictionary<T,G> d, IDictionary<T,G> e)
{
    return d.Keys.SequenceEqual(e.Keys);
}

This should work, because a Dictionary basically is a HashSet with an associated value. And more appropriate to your problem. (OK, to be correct, the code should go for Dictionary<> instead of IDictionary<> because Key order matters)

If you need an IEnumerable<> on the second parameter try sorting to get a defined order (not so efficient).

Robetto
  • 739
  • 7
  • 20
  • The implementation is different when you're comparing a set to another set - in that case, it can use a more direct and higher performing comparison. Otherwise, `SequenceEqual` is more or less equivalent. – Luaan Aug 07 '15 at 08:02
  • 4
    Won't SequenceEqual also impose ordering? Two dictionaries that contains the same number of elements and have the same capacity might have the same key order but if they have different capacities that is definitely not the case. The ordering of dictionary keys is undocumented behavior. – Lasse V. Karlsen Aug 07 '15 at 08:03
  • You are just trading one problem to another -- and the end I still have problem. How to implement **efficiently** `SetEquals` on `Dictionary` keys (or values, does not matter, because in this case it would be a twin pair). – greenoldman Aug 07 '15 at 08:13
  • As said, order of keys or elements in a set is not guaranteed nor documented. True, the capacity influences it (haven't thought about that). So to be safe the only valid check would be ORDER + SequenceEquals. Ordering by Hash value might be a hint, but without it is not quite possible. I bet, even if you implement your own HashSet it will be a problem of the same compexity to verify Set-Identity. Due to the N x N combinations of equality. So CS has that solution of introducing an order and check then. => O(N) – Robetto Aug 10 '15 at 11:48
  • Per your last comment, It would be good to *edit your answer* to show that some sort order needs to be imposed on the two sets of keys: not correct to directly compare the two key sequences. – ToolmakerSteve Nov 24 '18 at 14:24
2

Finally added in .NET 4.7.2:

HashSet.TryGetValue(T, T) Method

An SO post with more details

ToolmakerSteve
  • 18,547
  • 14
  • 94
  • 196
1

hopefully not blind but I haven't seen this answer anywhere. If you want dictionary's TryGetValue, you can just steal it.

theHashset.ToDictionary(item => item.ID).TryGetValue(key, out value)

All you need is a quick lambda for determining unique keys.

LuFFy
  • 8,799
  • 10
  • 41
  • 59
  • This whole topic is sprinkled with `Dictionary` -- Ctrl+F is your friend. – greenoldman Dec 12 '17 at 16:51
  • Obviously. Did I say I haven't seen anyone use Dictionary? No. I said I haven't seen this ^^^ answer. Ahhh CS people, as arrogant and snooty as ever. Gotta love'm. –  Dec 13 '17 at 21:30
  • Maybe you didn't see **this particular** answer because it is incorrect? `Dictionary` does not have `SetEquals` and I wrote abut it right in my question. – greenoldman Dec 14 '17 at 10:11
  • Incorrect how? Hashsets and dictionaries are easily interchangeable. just convert the result back to hashset and use hashset's setequals... –  Jan 11 '18 at 23:24
  • "Just convert" means extra memory allocation. – greenoldman Jan 12 '18 at 19:31
  • And? It's still a correct solution as far as I can tell. Maybe not the most efficient, but it's clean and easy, and I doubt whatever you're doing would necessitate something more efficient. Just nitpicking at this point to cover yourself. –  Jan 18 '18 at 04:48
  • Can you suggest any situation where this answer might be a useful alternative to what OP already said is his "fallback" solution, LINQ's `SingleOrDefault`? EDIT: I just thought of one: if one wanted to make a significant number of TryGetValue tests, during a time when the set is not being altered, then it might make sense to create this dictionary, as the cost of creating it would be amortized over all those uses. – ToolmakerSteve Nov 24 '18 at 14:31
  • @greenoldman - IMHO, the other answers use dictionary quite differently. Here, the proposal is to *temporarily* create a dictionary in order to perform the TryGetValue. So that dictionary doesn't need SetEquals. This may or may not be a useful suggestion (I'm skeptical that anyone will actually do this), but it is indeed a new suggestion. [A related approach would be to simultaneously maintain both the HashSet and the dictionary.] – ToolmakerSteve Nov 24 '18 at 14:39
  • @ToolmakerSteve, why you edited the answer instead of commenting it (with the added content)? The comments are exactly made for this purpose. Or simply answer the question. You cannot be upvoted, asked, edits serve another purpose -- to fix typos, clarify the question, etc. – greenoldman Nov 25 '18 at 06:17
  • @greenoldman - I'm confused by your question in the comment above. I didn't edit this answer. Are you referring to my edit on your original question? While I agree that is an unusual action to take, in this situation it was *vital* that anyone coming to this page be aware that .NET had been updated, so that they might not need to struggle with these old approaches; the benefit to future readers is the primary purpose of SO. Are we in agreement on the goal? Have I misunderstood your concern? – ToolmakerSteve Nov 25 '18 at 11:56
  • @ToolmakerSteve, I am sorry, I meant "question". Sure thing, it can stay this way, but since it is in-place edit it is not possible to comment it. – greenoldman Nov 26 '18 at 15:23