6

Has anyone written a generic function so that hash functions can be generated automatically for custom data types (using the deriving mechanism)? A few times, I've written the following kind of boilerplate,

data LeafExpr = Var Name | Star deriving (Eq, Show)
instance Hashable LeafExpr where
    hash (Var name) = 476743 * hash name
    hash Star = 152857

This could be generated automatically: the basic idea is that whenever adding data, you multiply by a prime, for example with lists,

hash (x:xs) = hash x + 193847 * hash xs

Essentially, what I'd like to write is

data LeafExpr = ... deriving (Hashable)

Edit 1

Thanks for all of the very helpful responses, everyone. I'll try to add a generic method as an exercise when I have time. For now (perhaps what sclv was referring to?), I realized I could write the slightly better code,

instance Hashable LeafExpr where
    hash (Var name) = hash ("Leaf-Var", name)
    hash Star = hash "Leaf-Star"

Edit 2

Using ghc, multiplying by random primes works much better than tupling in edit 1. Conflicts with Data.HashTable went from something like 95% (very bad) to 36%. Code is here: [ http://pastebin.com/WD0Xp0T1 ] [ http://pastebin.com/Nd6cBy6G ].

gatoatigrado
  • 16,580
  • 18
  • 81
  • 143
  • 2
    Watch out with that multiplication. The result of overflowing an Int in Haskell is unspecified. – augustss May 23 '11 at 08:53
  • 2
    Adding a `makeHashable` to Data.DeriveTH would be a good TH exercise for someone. If you're unaware, that would mean you could derive Hashable with `$( derive makeHashable ''LeafExpr)` – Thomas M. DuBuisson May 23 '11 at 16:49

1 Answers1

2

How much speed do you need? You could use one of the packages that use template haskell to generate serialization code to convert the value to binary and then hash the binary array with hashable.

Chris Kuklewicz
  • 8,123
  • 22
  • 33
  • If the `LeafExpr`s aren't too big, this would even work with `show`. – John L May 23 '11 at 09:59
  • Actually, Hashable on its own has sufficient instances, including for tuples and lists, to let you write an instance for any datatype fairly trivially. – sclv May 23 '11 at 13:50