-1

I've created a program to take a users predetermined unique identifier, hash it, and store it in a dictionary mapping to the user's name. I later receive the unique identifier, rehash it, and can look up the user's name.

I've come to a problem where an individual's 9 digit unique ID hash()'s to the same number as somebody else. This has occurred after gathering data for about 40 users.

Is there a common work around to this? I believe this is different than just using a hashmap, because if I create a bucket for the hashed ID, I won't be able to tell who the user was (whether it be the first item in the bucket or second).

Edit:

id = raw_input()
hashed_id = hash(id)
if not dictionary.has_key(hashed_id):
    name = raw_input()
    dictionary[hashed_id] = name
 check_in_user(dictionary[hashed_id])
JamesRichardson
  • 131
  • 2
  • 13
  • 1
    This is why dictionaries and sets require you to implement both `__hash__` **and** `__eq__`, so it can double-check in case of collision. Without a [mcve], it's hard to suggest specifically what (if anything) you need to change. – jonrsharpe Oct 24 '17 at 14:16
  • 1
    What if you used the unique identifier as the key in the dictionary, instead of the hash of the unique identifier? Since the identifier is unique, you shouldn't have any collisions. – Kevin Oct 24 '17 at 14:17
  • 1
    your missfortune (or luck) is probably of godly proportions.. But as the other users said, what you should be doing is checking the **hashed value entered** against the **hash value stored** for the **particular user**; not all users. – Ma0 Oct 24 '17 at 14:18
  • I never thought I'd run into this problem haha, and essentially a user doesn't have an account, the idea would be that anybody could come and enter their ID and the system will find who they are and check them into a meeting. The reason I'm not using their ID is because the information is sensitive and I felt hashing could be better than keeping their plain ID in a file. – JamesRichardson Oct 24 '17 at 14:23

1 Answers1

1

I have never seen hash() used for this. hash() should be used for data structures as a shorthand for the entire object, such as keys in the internal implementation of dictionaries.

I would suggest using a UUID (universally unique identifier) for your users instead.

import uuid
uuid.uuid4()
# UUID('d36b850c-2433-42c6-9252-6371ea3d33c2')

You'll be very hard pressed to get a collision out of UUIDs.

Adam Barnes
  • 2,922
  • 21
  • 27