1

I am writing a MongoDB Collection that contains a specific set of data, and I want to run comparisons against that data by taking an MD5 (or maybe SHA256) hash of the data and basing comparisons off of that.

I was wondering if using a fixed-length character string of hex-numbers is the right way of doing this. Is there a better datatype to use, such as a "blob" or even a 64bit long integer to hold the values? (This may require me to use a hashing function that produces longs -- I don't know of one except maybe overriding the Java .hashCode() function with Eclispe?)

If there is a better way entirely, advise on best practice would be appreciated here!

Community
  • 1
  • 1
E.S.
  • 2,733
  • 6
  • 36
  • 71
  • Do I understand correctly that you are basically looking for a proper hash function for elements of a collection? – fishi0x01 Jan 27 '15 at 00:44
  • Yes -- and once that hash function is found, the best datatype to use in Mongo to hold the hash. Since a hash is made up of just 0-9A-F, I think a string would be excessive? – E.S. Jan 27 '15 at 00:45
  • I depends on what you want to use this for really. Just dumping binary content to MongoDB is very simple as it just does it and exact comparisons are not a problem either. If you didn't expect this to be completely random and required some sort of ordering, then you would be better off with a string. I presume that the purpose of this is to match content in a "sub-set" of fields, otherwise there would be no point at all. For a whole document of uniquely hashed content is called a "primary-key", which already exists. – Neil Lunn Jan 27 '15 at 01:49

1 Answers1

1

Storing MD5 Hashes in MongoDB

You have to use String or Binary (half the size) in case you decide to store a MD5 hash (see here).

Best Hash Function

This is tough to answer, since it highly depends on the kind of data in your collection. I personally think that MD5 hashes are a good way, but again it depends on the use-case. In case you want to customize/optimize your hash, this post and this post might get you started. They cover some simple recipes on writing a custom hash function.

Community
  • 1
  • 1
fishi0x01
  • 3,579
  • 21
  • 25