3

When I try to use the hash function, it seems the update method doesn't overwrite the string:

For example, given a string magazine

hasher = hashlib.sha256() #set the hasher

hasher.update(magazine.encode('utf-8'))
print( int( hasher.hexdigest(), 16 ) % 10**8)

hasher.update(magazine.encode('utf-8'))
print( int( hasher.hexdigest(), 16 ) % 10**8)

will print 73983538 65808855

hasher = hashlib.sha256()
hasher.update(magazine.encode('utf-8'))
print( int( hasher.hexdigest(), 16 ) % 10**8)

hasher = hashlib.sha256() #reset the hasher
hasher.update(magazine.encode('utf-8'))
print( int( hasher.hexdigest(), 16 ) % 10**8)

will print
73983538 73983538

What exactly is the update function, and is there a way to reset the string without creating a new hasher?

Many thanks,

midawn98
  • 401
  • 1
  • 4
  • 8

1 Answers1

0

Why don't you want to create a new hasher? One hasher represents the hash of one "thing", the update method exists, such that you can hash large amounts of data (some amount of bytes per time). I.e. both

hasher = hashlib.sha256()
hasher.update(b"foo")
hasher.update(b"bar")

and

hasher = hashlib.sha256()
hasher.update(b"foobar")

lead to the same hash.

There is no way to reset the state of the hash object, as the state isn't even (directly) accessible via Python (as it's written in C).

L3viathan
  • 26,748
  • 2
  • 58
  • 81
  • Great! Also wondering of the difference between hasher.update(b"foo") and hasher.update("foo) – midawn98 Dec 11 '18 at 12:23
  • You have to hand bytes to the hasher, not strings (because they have an ambiguous representation in bytes). I did that with the `b"...."`-prefix, which makes the strings bytestrings. You did that by using `.encode("utf-8")`, which encodes a string (to bytes) using the UTF-8 encoding. In Python 2, bytes and strings were the same, and there was a seperate `unicode` type which is what strings are in Python 3. – L3viathan Dec 11 '18 at 12:38