2

I want to write a birthday attack program in Haskell for a variant of SHA1 which only produces only a 50 bit hash. To do this I need a hash table capable of storing approx. 2^25 entries.

The keys in this map will be Int64 and the values will be short length strings (~ 16 bytes).

Any suggestions for which hash implementation to use?

(Disregard that last update - I would need a bit array of 2^50 elements.)

ErikR
  • 51,541
  • 9
  • 73
  • 124
  • Well, SQL would be the simplest. But their are lots of variations of in-storage and "sorta in-storage" hash tables. A lot would depend on your operating environment and some of the characteristics of the data (such as how evenly it's distributed). – Hot Licks Apr 03 '12 at 21:54

2 Answers2

6

For 2^25 entries at 8 bytes a piece, you are looking at something like 768MB of storage for just the data, at most probably around 3 gigabytes with actual overhead for storing bytestrings -- guesstimating 80 bytes per bytestring, then you have the hashtable/map's internals to store, and the boxing for the key, etc.

This means you can store the entire thing resident in memory on a decent machine, which keeps the problem relatively sane, but your collection times will kind of suck.

I would recommend using a lot of smaller hash tables, by partitioning your keyspace, that way you can run lots of the updates in parallel regardless of the hash table you use.

As for implementation:

You can either wrap a bunch of immutable hash tables like the wide-fanout ones from unordered-containers in IORefs and use some kind of atomicModifyIORef or something like ryan newton's compare and swap primitive, or you can try to use the old Data.HashTable implementation in a straightforward imperative manner.

The latter will improve your asymptotics by a logarithmic factor over the hash-array mapped tries used by unordered-containers, but Data.HashTable has bad constants. At the scale of your problem these factors probably cancel out though.

Edward Kmett
  • 29,632
  • 7
  • 85
  • 107
  • Also see Gregory Collins' hashtables package (http://hackage.haskell.org/package/hashtables) for mutable hashtables with better constant factors than Data.HashTable. – rp123 Apr 04 '12 at 02:22
2

I also posted same sort of question. And from some suggestion, I am using Kyoto Cabinet. It is pretty cool and gives nice performance also. You can check my posts also because I have similar issues. EX. one, two and three. Perhaps this may helpful.

Community
  • 1
  • 1
Arpssss
  • 3,850
  • 6
  • 36
  • 80