I would like to generate an integer-based unique ID for users (in my df).
Let's say I have:
index first last dob
0 peter jones 20000101
1 john doe 19870105
2 adam smith 19441212
3 john doe 19870105
4 jenny fast 19640822
I would like to generate an ID column like so:
index first last dob id
0 peter jones 20000101 1244821450
1 john doe 19870105 1742118427
2 adam smith 19441212 1841181386
3 john doe 19870105 1742118427
4 jenny fast 19640822 1687411973
10 digit ID, but it's based on the value of the fields (john doe identical row values get the same ID).
I've looked into hashing, encrypting, UUID's but can't find much related to this specific non-security use case. It's just about generating an internal identifier.
- I can't use groupby/cat code type methods in case the order of the rows change.
- The dataset won't grow beyond 50k rows.
- Safe to assume there won't be a first, last, dob duplicate.
Feel like I may be tackling this the wrong way as I can't find much literature on it!
Thanks