2

I'm generating MD5 digests with Python like so:

import hashlib
m = hashlib.m5()
md5 = m.update('some_string')
return md5.digest()

Now, looking at a similar SO question, I see that you should use the bytea type. However, I plan using the above snippet to generate these digests, although I'm open to modifying it. Is there anything I should be aware of being sending this off to the DB assuming that I have a field of type bytea? Would this be the proper way of storing digests? Might be a redundant question, but I wanted to be sure. I'm just storing md5s of files and was planning on using the hash as a unique identifier; nothing mission critical.

Community
  • 1
  • 1
  • 2
    A [bytea/binary field](http://www.postgresql.org/docs/9.1/static/datatype-binary.html) is sufficient. Some people prefer to store hex-encoded text. The type picked may determine when/where conversions need to be applied. I would avoid anything 'more fancy between the two' like Base64-encoding, as then it takes up more space than bytea and it is harder to see than a simple hex. – user2864740 Mar 16 '15 at 21:18
  • 3
    [MD5 is severely compromised](https://en.wikipedia.org/wiki/MD5#Security) and there isn't much reason to use it but for legacy reasons. If you can use a better hash algorithm such as SHA1, do so. – Schwern Mar 17 '15 at 01:28
  • See implementations discussion at https://stackoverflow.com/a/67372074 – Peter Krauss Oct 31 '21 at 22:43

0 Answers0