I am interested in de-identifying a sensitive data set with both time-fixed and time-variant values. I want to (a) group all cases by social security number, (b) assign those cases a unique ID and then (c) remove the social security number.
Here's an example data set:
personal_id gender temperature
111-11-1111 M 99.6
999-999-999 F 98.2
111-11-1111 M 97.8
999-999-999 F 98.3
888-88-8888 F 99.0
111-11-1111 M 98.9
Any solutions would be very much appreciated.