1

As we know, in Haskell, a set of elements is represented by a binary searching tree in Data.Set module, which is efficient. However, most operations require the elememt to be an instance of the Ord class.

However, a general set doesn't need its elememt to be an instance of the Ord. Since a set is where there is no duplicate elements, so its element being an instance of the Eq class is just enough.

In Haskell, I can only think of the implementation by a single linked list, just like the default [a], but the single linked list are not as efficient as a BST, while a BST needs the Ord class.

By the way, in Python, the set class doesn't needs its element to be orderable. Only __eq__ and __ne__ (which is silimar to Haskell's Eq class) defined is enough, e.g.:

class Fruit:
    def __init__(self, name):
        self.name = name.title()

    def __repr__(self):
        return self.name

    def __eq__(self, other):
        return self.name == other.name     # defines the equality operation

    def __ne__(self, other):
        return self.name != other.name     # defines the unequality operation

    def __hash__(self):
        return hash(self.name)     # so that Fruit instance can be an element of a set

?: x = Fruit('apple')

?: y = Fruit('apple')

?: z = Fruit('Apple')

?: {x,y,z}

{Apple}

?: x <= y

Traceback (most recent call last):

File "", line 1, in module

x <= y

TypeError: '<=' not supported between instances of 'Fruit' and 'Fruit'

So I wonder whether there are some efficient data structure in Haskell that could be used to represent the Set but does not need the Ord class.

Z-Y.L
  • 1,740
  • 1
  • 11
  • 15
  • 2
    I would guess that python uses the "Ord instance" of the hash code to implement the set – sara Mar 13 '19 at 06:49
  • 1
    There are other options than binary search trees. See `Data.IntSet` (a PATRICIA tree) and `Data.HashSet` (a hash array-mapped trie). – dfeuer Mar 13 '19 at 08:53
  • 1
    To elaborate on @dfeuer: `Data.HashSet` (from the `unordered-containers` library) is probably your best bet here; it uses `Hashable` instead of `Ord`, allowing more types to be stored. I don't think you can get rid of the constraint entirely though. – bradrn Mar 13 '19 at 08:58
  • 1
    @bradrn, probably not more types. But a slightly different set of types, and many more efficiently. – dfeuer Mar 13 '19 at 09:01
  • @dfeuer Thanks for correcting me! I always thought that `HashSet` (a) supported more types and (b) was less efficient than `Set`; it seems I was wrong on both counts. – bradrn Mar 13 '19 at 09:05
  • It should be possible to have a hash-set with O(1) acces for the case you don't have to change the content of the set, just like you have vectors which have O(1) access as well. Using the right monad it should be possible to have O(1) update as well. I never used something like this in Haskell so you have to do your own reserarch for this. – Elmex80s Mar 13 '19 at 11:51
  • 1
    Your English text says "Only `__eq__` and `__ne__`", but your code says "...and also `__hash__`". – Daniel Wagner Mar 13 '19 at 15:01
  • @DanielWagner I am a haskell and python hobbyist, so maybe I haven't understood the mechanism of the implementation. `__hash__ is defined since python needs it for a set, but actually I don't understand why the class has to be hashable. Are there any implicit information that I miss with the `__hash__` defined? – Z-Y.L Mar 14 '19 at 00:36
  • You could always try bloom filters. That still requires a hash implementation though... – MathematicalOrchid Mar 18 '19 at 15:03

1 Answers1

3

Python is cheating. All values in Python are hashable and set() is just a dictionary with no value.

Python can get away with this as a side effect of being dynamically typed language. The same machinery that lets Python keep track of the types of it variables also makes it easy to impose an psudo-ordering on them.

Haskell as a statically typed language espically a statically typed language with first class functions cannot invent an arbitrary ordering for a data type. Therefore Data.Set requires Ord to be efficient. Without ordering adding a new value to a set becomes O(n).

John F. Miller
  • 26,961
  • 10
  • 71
  • 121
  • 2
    There is `Data.HashSet` which requires `Hashable a, Eq a` - but it's a tree too. – phadej Mar 13 '19 at 10:05
  • I do not think it has anything to do with dynamically typed. In C# and Java you can have (hash)sets with the same access / update time complexity as in Python. – Elmex80s Mar 13 '19 at 11:55
  • 1
    Not all values in Python are hashable, and it is hardly "cheating" to take advantage of available mutable data structures. – chepner Mar 13 '19 at 14:41
  • @Elmex80s my claim was not about speed. Haskell sets are very fast as well, but require that the programmer (or standard library) need to provide an `Ord` instance. What dynamic typing has is a free psudo-ordering of all values based on the internals header of the value as stored in memory. I do not know enough about the low-level representation of data in Java or C# to say if they have similar value headers that could be used, but Haskell removes type information when converting from the STG representation. https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/GeneratedCode – John F. Miller Mar 13 '19 at 22:48
  • 1
    So, if I understand it right, what you mean is for the sake of speed, there is something related to the ordering explicitly or implicitly. Since haskell's `HashSet` needs its elements to be hashable, I understand that `HashSet` is equivalent to python's set roughly, am I right? – Z-Y.L Mar 14 '19 at 01:07