I am working on this dataset looks very similar like below where,
transaction_id customer_id phone email
1 19 12345 123@email.com
2 19 00001 245@gmail.com
3 Guest 00001 123@email.com
4 22 12345 123@email.com
5 23 78900 678@gmail.com
The customers under 19, Guest and 22 are actually the same, according to the similar info used in columns phone and email.
As long as, the customer ids for the customer are not unique, my goal is to find similar rows and assign a new unique customer id (to create a new unique customer_id column).
trans_id cust_id phone email unique_id
1 19 12345 123@email.com 1
2 19 00001 245@gmail.com 1
3 Guest 00001 123@email.com 1
4 22 12345 123@email.com 1
5 23 78900 678@gmail.com 2
The complicated side is, I can groupby email, or I can groupby email and phone. But I couldn’t grasp all rows, for example transaction number 2 is always being assigned as other unique customer id. I tried this code.
df['unique_id'] = df.groupby(‘phone’).grouper.group_info[0]
I greatly appreciate your time and help.