0

A previous question was asked and the reader was ask to use hexdigest() instead. ...and that does work. But what is the structure of the format of digest?

The following test code:

import hashlib, base64
f1 = open('foo.jpeg', 'rb')    
m = hashlib.sha512()
m.update(f1.read())
sha = m.digest()
print(m.digest())
print(m.hexdigest())
res = base64.b64encode(sha)
print( res)

produces the following output:

>>> 
b'\xf3g\xd1S\xc4#OK\xb8\xb7\x1f~r\xf0\x19JE\xb0d\xb9\x11O\x08\x1c\xc66\x00\xb3i*\x87\x08\x92+\xd3)F\x02\t\x80\xf0m\x8b;\x9c\xcdq\xbd\xb9\x92k\x7f}d\t\xc65\x12\x0b\x17\xf9]5\x97'

f367d153c4234f4bb8b71f7e72f0194a45b064b9114f081cc63600b3692a8708922bd32946020980f06d8b3b9ccd71bdb9926b7f7d6409c635120b17f95d3597
>>> 

I don't get what the parts like "#OK", "~r", "i*" etc. mean in the first line of output above. Any light that can be shed on this would be greatly appreciated. The hexdigest() output, of course, makes perfect sense.

Previous question was: hashlib.sha256 returned some weird characters in python. Many Thanks.

DSM
  • 342,061
  • 65
  • 592
  • 494
B Brown
  • 15
  • 1
  • 5

2 Answers2

2

The output of a hash function like sha512 is a 512 bit string or 64 byte string. Thus the result of m.digest is a bytes object of length 64. The output is pseudo random, thus the "#OK" in the hash is purely coincidental. The output of m.hexdigest are the same bytes encoded as hexadecimal digits.

Perseids
  • 12,584
  • 5
  • 40
  • 64
  • Got it! Thanks. I assumed something was being done to make the output printable... and that is clearly not the case... or rather, when something is done to make it printable you end up with hex digest(). – B Brown Aug 11 '13 at 17:05
  • @user54289: Right. For hashes it is customary to represent them as hex digits, if you have to print them, but if you need to save space, you can also base64 encode it like you did in the second to last line of your code. Please mark this answer as accepted if you found it helpful and it answered your question. – Perseids Aug 11 '13 at 17:35
  • but here he has n output of length 128. 1024 bits. this is not normal – David 天宇 Wong Apr 29 '14 at 00:01
  • 1
    @David天宇Wong: The output is 128 hex characters. As each hex character represents 4 bits you actually have 512 bits only. – Perseids Apr 29 '14 at 04:29
1

As the python reference states:

hash.digest()

Return the digest of the strings passed to the update() method so far. This is a string of digest_size bytes which may contain non-ASCII characters, including null bytes.

So what you see is the byte representation of your digest so far.

If you have a look at the ASCII chart, you see that some of the bytes can be represented as printable characters. For example, the second byte in your digest (hex 67) encodes the character g, whereas the first byte (hex f3) cannot be represented as a printable character and is thus printed out as \xf3.

Community
  • 1
  • 1
PoByBolek
  • 3,775
  • 3
  • 21
  • 22