I am comparing personal info of individuals, specifically their name, birthdate, gender, and race by hashing a string containing all of this info, and comparing the hash objects' hexdigests. This produces a 32 digit hexadecimal number, which I am using as a primary key in a database. For example, using my identifying string would work like this:
>> import hashlib
>> id_string = "BrianPeterson08041993MW"
>> byte_string = id_string.encode('utf-8')
>> hash_id = hashlib.md5(bytesring).hexdigest()
>> print(hash_id)
'3b807ad8a8b3a3569f098a575091bc79'
At this point, I am trying to ascertain collision risk. My understanding is that MD5 doesn't have significant collision risk, at least for strings that are relatively small, which mine are (about 20-40 characters in length). However, I am not using the 128-bit digest object, but the 32 digit hexdigest.
Now, I believe the hexdigest is a compression of the digest (that is, it's stored in fewer characters), so isn't there an increased risk of collision when comparing hexdigests? Or am I off-base?