2

I am looking if there is a pre-existing .Net 'Hash-Set type' implementation suitable to atomizing a general type T. We have a large number of identical objects coming in for serialized sources that need to be atomized to conserve memory.

A Dictionary<T,T> with the value == key works perfectly, however the objects in these collections can run into the millions across the app, and so it seem very wasteful to store 2 references to every object.

HashSet cannot be used as it only has Contains, there ?is no way? to get to the actual member instance.

Obviously I could roll my own but wanted to check if there was anything pre-existing. A scan at C5 didn't see anything jumping out, but then their 250+ page documentation does make me wonder if I've missed something.

EDIT The fundemental idea is I need to be able to GET THE UNIQUE OBJECT BACK ie HashSet has Contains(T obj) but not Get(T obj) /EDIT

The collection at worst only needs to implement:

T GetOrAdd(T candidate)
void Clear()

And take an arbitary IComparer And GetOrAdd is ~O(1) and would ideally be atomic, i.e. doesn't waste time Hashing twice.

EDIT Failing an existing implementation any recommendations on sources for the basic Hashing / Bucketing mechanics would be appreciated. - The Mono HashSet source has been pointed out for this and thus this section is answered /EDIT

tolanj
  • 3,651
  • 16
  • 30
  • 1
    I normally use a `Dictionary` :-) You are probably only wasting 64 bits (8 bytes) of memory for each element. – xanatos Feb 27 '15 at 09:43
  • This might be what you are looking for: http://stackoverflow.com/questions/18922985/concurrent-hashsett-in-net-framework. Let me know if it isn't and I will reopen. – Patrick Hofman Feb 27 '15 at 10:11
  • @PatrickHofman How does a concurrent hashset (with all the examples that use HashSet to implement it) resolve this question? – xanatos Feb 27 '15 at 10:16
  • @xanatos: OP wants to add only one instance to the collection, and do this atomically (I guess for multi-threading). This solves both issues. – Patrick Hofman Feb 27 '15 at 10:18
  • @tolanj: Okay. No problem. – Patrick Hofman Feb 27 '15 at 10:20
  • So you want to get `T` if `T` is in the list? So in fact the method should return `T` if it `Contains` `T`. That shouldn't be hard... You can implement your own comparer class, right? – Patrick Hofman Feb 27 '15 at 10:25
  • @Patrick yes, I already have totally working solutions with Dictionary with key == value, JUST looking to remove the redundancy. xanatos has suggested to Mono source of HashSet and that would do it if I am rolling my own, so question is really is there a built in that works – tolanj Feb 27 '15 at 10:27
  • Why not just supplying a comparer to the constructor of `HashSet` and you will be fine. – Patrick Hofman Feb 27 '15 at 10:28
  • For strings it should use the default string comparer as far as I know. Else you can write your own. – Patrick Hofman Feb 27 '15 at 10:44
  • @Patrick this question has nothing to do with the comparer. – tolanj Feb 27 '15 at 11:22

1 Answers1

0

You can take a source code of a HashSet<T> from Reference Source and write your own GetOrAdd method.

Yoh Deadfall
  • 2,711
  • 7
  • 28
  • 32
  • 1
    Unfortunately for me to do that would seemingly be illegal --- Microsoft Reference Source License (Ms-RSL) [edit] This is the most restrictive of the Microsoft Shared Source licenses. The source code is made available to view for reference purposes only, mainly to be able to view Microsoft classes source code while debugging.[17] Developers may not distribute or modify the code for commercial or non-commercial purposes. – tolanj Feb 27 '15 at 10:14
  • 2
    @tolanj You can take `HashSet` from Mono :-) It's MIT/X11 licensed (the class libraries at least). – xanatos Feb 27 '15 at 10:17
  • @tolanj Note that in latest source code on github they removed the class, because they began using the Microsoft source code. Here https://github.com/mono/mono/blob/mono-3.12.0.76/mcs/class/System.Core/System.Collections.Generic/HashSet.cs there is the version that is probably used in the latest "packetized" mono. – xanatos Feb 27 '15 at 10:28
  • 1
    @tolanj: The latest version of the [reference source](https://github.com/Microsoft/referencesource) is released under the [MIT license](https://github.com/Microsoft/referencesource/blob/master/LICENSE.txt), so you *can* use the framework version of [`HashSet`](https://github.com/Microsoft/referencesource/blob/master/System.Core/System/Collections/Generic/HashSet.cs) from there. – LukeH Feb 27 '15 at 14:09