I've a medical data of approximately 10,000 patients. I want to replace their IDs/Social Security Numbers (Patient_SSN) with a unique ID for each patient. Please note that some of the rows have the same participant SSN, this is is because the data is stored on visit level. In other words, each visit is stored in a new row (i.e. with different date), such as 'Mary' and 'John' data.
Patient_Name = c("Alex", "Mary", "Sarah", "John", "Susan", "Jessica", "Sarah", "Karen", "Mary", "John")
Patient_SSN = c(1234, 43251, 9320, 2901, 3229, 4291, 9320, 9218988, 43251 , 2901)
Visit_Date = c('10_21', '10_21', '10_25', '10_25','10_26','10_27','10_28','10_28','10_28' ,'10_29')
BMI = runif(10, min=12, max =25);
data_hospital = data.frame(Patient_Name, Patient_SSN, BMI, Visit_Date)
My question is: how can replace each SSN with a new ID for participant privacy, but keep in mind that some rows have the same SSN? The length of the characters of the new SSNs/IDs should be the same as the length of the original Patient_SSN characters. Thank you in advance for assistance.