0

I need a datastructure that has O(1) insert and remove operations and from which I can retrieve a random element at O(1).

I thought of using HashSet in C#. Insert and remove operations are constant and I would be able to also get a random element from it if I could access the internal array. I don't want to create an array from the set, that would take O(n) every time.

So my question is, is there a way to access the internal array of a HashSet? Or is there a better datastructure that meets the requirements?

Note: for example a Dictionary<TKey, TValue> allows to access the internal array for the keys and the values. It would be awesome if a HashSet had this too. I could use a Dictionary, but I wouldn't really use it for the mapping feature, which requires twice as much space as a HashSet.

niodev
  • 39
  • 1
  • 5
  • 3
    They [appear to be called](https://source.dot.net/#System.Private.CoreLib/HashSet.cs,2d265edc718b158b) `_buckets` and `_entries` and they are of course private so you may have to pry them out of there [using reflection](https://learn.microsoft.com/en-us/dotnet/framework/reflection-and-codedom/reflection). Does not seem like a great idea, but knock yourself out. –  Mar 27 '21 at 23:18
  • What version of .NET? – aepot Mar 27 '21 at 23:36
  • The version is .NET 4 – niodev Mar 27 '21 at 23:42
  • 3
    I would not access the internals that way. – Daniel A. White Mar 28 '21 at 00:10
  • 2
    Why do you assume that there is an internal array at all? In fact, both `HashSet` `Dictionary` are based on hash maps, not on arrays like `List`. Keys and Values for the dictionary are most likely not really arrays, and the API promises nothing about this. – Alejandro Mar 28 '21 at 00:23
  • Related: [Picking any item from a HashSet is very slow](https://stackoverflow.com/questions/64186410/picking-any-item-from-a-hashset-is-very-slow-how-to-do-this-fast). It includes a solution that uses reflection. – Theodor Zoulias Mar 28 '21 at 01:01

1 Answers1

1

You appear to be under a whole bunch of misconceptions that should have been corrected in any intro course to data structures.

I need a datastructure that has O(1) insert and remove operations and from which I can retrieve a random element at O(1).

There is no such data structure.

And the Dictionary<> implementation you assign this property to in fact stores the data twice, once in a hash table structure (same as HashSet<>) and a second time in an array for easy iteration, which makes deletion technically O(n) since it has to update the associated array. However if you find this appealing, nothing's stopping you from storing your data in a second array next to your set.

Insert and remove operations are constant

No, they're not, because of collisions. Inserting and deleting data that collides is a linear operation, and so the worst-case complexity for hash tables is actually O(n).

is there a way to access the internal array of a HashSet?

There is no such array in HashSet.

As to what you can do, you can iterate over a set. HashSet returns its contents in "random, but stable" order, and SortedSet returns its data sorted in a user-defined way. If your algorithm can't work linearly, and it can't work through set lookups, and you don't want to implement an array to store whatever order you want, I suggest implementing a better algorithm.

Blindy
  • 65,249
  • 10
  • 91
  • 131
  • 2
    Inserts, access, and deletions all have a best case of `O(1)` and a worse case of `O(n)`. – Enigmativity Mar 28 '21 at 00:51
  • 1
    The amortized time complexity for access, deletions and insertions is O(1). It is amortized, since on insertion the internal array might have to be resised which only in that case it takes O(n). And remove is O(1) anyways. – niodev Mar 28 '21 at 20:33
  • 1
    Removal from an array isn't O(1), and you can call it amortized, or worst-case, or whatever you wish, in any hash table that's used for some amount of items you will get collisions and those collisions will make all your access linear. – Blindy Mar 28 '21 at 21:37
  • 1
    @Blindy here is the documentation for hashset: https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.hashset-1.remove?view=net-5.0#System_Collections_Generic_HashSet_1_Remove__0_ – niodev Mar 29 '21 at 09:21
  • And [here](https://referencesource.microsoft.com/#system.core/System/Collections/Generic/HashSet.cs,292) is the implementation, with a huge `for` loop in the middle of it. Can you please stop arguing and just learn data structures? They're all already solved. – Blindy Mar 29 '21 at 14:53
  • @Blindy niodev is right. The `HashSet.Remove` method is an O(1) operation, as stated in the [documentation](https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.hashset-1.remove#remarks). The `for` loop in the [implementation](https://referencesource.microsoft.com/system.core/System/Collections/Generic/HashSet.cs.html#4e77ae9204a00be6) handles the case of hashcode collisions. In case the hashcode of an item does not collide with another item's hashcode (modulo `m_buckets.Length`), the `for` loop will exit after only one iteration. – Theodor Zoulias Mar 29 '21 at 20:31
  • @Blindy what data structure would you suggest for someone who wants O(1) add, O(1) search, O(1) remove, and O(1) get-random computational complexity? – Theodor Zoulias Mar 29 '21 at 20:36
  • No he's not right, what are you talking about? There is no "oh only if there's no collisions", a method is constant or linear period, and removing (and adding) an item from a hash table is linear. As to what data structure I would recommend, none, the only structures that can give you that are limited ones like bit fields (trade memory for complexity). If your algorithm needs that, then it's a bad algorithm and what I suggest is writing a better one, like the rest of us can do without problems. – Blindy Mar 30 '21 at 03:39
  • 1
    @Blindy [constant time](https://en.wikipedia.org/wiki/Time_complexity#Constant_time) means that the running time does not depend on the size of the input, and [linear time](https://en.wikipedia.org/wiki/Time_complexity#Linear_time) means that the running time increases linearly with the size of the input. Do you have any experimental evidence that the behavior of the `HashSet.Remove` method is linear? AFAIK the only possibility for observing a linear behavior is if the `GetHashCode` implementation of the provided `IEqualityComparer` is totally broken (returns always the same value). – Theodor Zoulias Mar 30 '21 at 17:37
  • 1
    @Blindy it seems that you prefer to define the behavior of a data structure based on the worst case scenario, instead of the average/normal scenario. Fair enough. I suggest to open a [GitHub issue](https://github.com/dotnet/runtime/labels/area-System.Collections) and request that the [documentation](https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.hashset-1.remove#remarks) of the `HashSet.Remove` is corrected according to your preferences. Also, instead of advising people to "learn data structures" in general, you should consider linking to specific learning material. – Theodor Zoulias Mar 30 '21 at 17:56