For 2^25 entries at 8 bytes a piece, you are looking at something like 768MB of storage for just the data, at most probably around 3 gigabytes with actual overhead for storing bytestrings -- guesstimating 80 bytes per bytestring, then you have the hashtable/map's internals to store, and the boxing for the key, etc.
This means you can store the entire thing resident in memory on a decent machine, which keeps the problem relatively sane, but your collection times will kind of suck.
I would recommend using a lot of smaller hash tables, by partitioning your keyspace, that way you can run lots of the updates in parallel regardless of the hash table you use.
As for implementation:
You can either wrap a bunch of immutable hash tables like the wide-fanout ones from unordered-containers in IORefs and use some kind of atomicModifyIORef or something like ryan newton's compare and swap primitive, or you can try to use the old Data.HashTable implementation in a straightforward imperative manner.
The latter will improve your asymptotics by a logarithmic factor over the hash-array mapped tries used by unordered-containers, but Data.HashTable has bad constants. At the scale of your problem these factors probably cancel out though.