Python 3 same text but different md5 hashes

Question

I have a relatively simple text processing algorithm that imports some words from a text file and produces a phrase. The algorithm has a second path it can take if a setting flag (a simple constant) is enabled. The second path basically is an additional list comprehension that filters out some words.

In both cases the algorithm produces the same phrase (str1 and str2 below) but the md5 hash of each phrase is different. I confirmed this using the Python shell:

(NOTE: the phrase and hash values are not the actual values being used)

>>> import hashlib
>>> 
>>> str1 = "some phrase"
>>> str2 = "some phrase"
>>> str1 == str2
True
>>> 
>>> md5 = hashlib.md5()
>>> 
>>> md5.update(str1.encode('utf-8'))
>>> hash_1 = md5.hexdigest()
>>> 
>>> md5.update(str2.encode('utf-8'))
>>> hash_2 = md5.hexdigest()
>>> 
>>> print(hash_1)
34281bdd108d35dec09dd6599bc144gf
>>> print(hash_2)
0670d0df2506c7gf0d5ee27190g2d919

How is this possible?

Because you *updated* the md5 object with the second string. Even if the strings are the same, you now have two hash operations. If you created a new md5 object for str2, you would get the same hexdigest. — Daniel Roseman, Jan 15 '19 at 16:30
As a side note: this code will only work in python2 but the question is tagged with `python-3.x` and the title contains "Python 3". Which one is it? — Fynn Becker, Jan 15 '19 at 16:33
@FynnBecker, sorry about that. The actual code is written in Python 3 but I just used the Python 2 shell in my terminal for the example and copied it over. Will amend. — Trizzaye, Jan 15 '19 at 16:48
@Trizzaye in future when someone suggests an edit to your post and you don't want to keep those edits, reject that edit, do not approve and edit it back out — WhatsThePoint, Jan 15 '19 at 17:06

score 6 · Accepted Answer · answered Jan 15 '19 at 16:32

According to the documentation, update update the current hash with the string, and does not create a new one. You need to instantiate a new md5 object for that.

https://docs.python.org/2/library/hashlib.html#hashlib.hash.update

import hashlib

str1 = "some phrase"
str2 = "some phrase"
print(str1 == str2)

md51 = hashlib.md5()

md51.update(str1.encode('utf-8'))
hash_1 = md51.hexdigest()

md52 = hashlib.md5()
md52.update(str2.encode('utf-8'))
hash_2 = md52.hexdigest()



print(hash_1 == hash_2) # True

According to the documentation (again), update is the equivalent of hashing the both strings, here is a little snippet to show it

import hashlib

str1 = "some phrase"
str2 = "some phrase"
print(str1 == str2)

md51 = hashlib.md5()

md51.update((str1 + str2).encode('utf-8'))
hash_1 = md51.hexdigest()

md52 = hashlib.md5()
md52.update(str1.encode('utf-8'))
md52.update(str2.encode('utf-8'))
hash_2 = md52.hexdigest()



print(hash_1 == hash_2)

To make it work in python2, just remove the .encode('utf-8')

Python 3 same text but different md5 hashes

1 Answers1

Linked