Suppose I have a dataset that looks like the following:
obs id1 id2
1 a 1
2 b 2
3 c 2
4 d 3
5 e 4
6 b 5
7 f 6
I want to create a unique transitive id
variable that for this dataset. Both id1
and id2
are used to identify individuals. So if individual X
has the same id1
as individual Y
or the same id2
as individual Y
, then X=Y
.
So, in this example, the intended output would look like this:
obs id1 id2 uniqid
1 a 1 1
2 b 2 2
3 c 2 2
4 d 3 3
5 e 4 4
6 b 5 2
7 f 6 5
Here, observation 6
has id1
"b", which was already assigned uniqid 2
(by observation 2
), and so, observation 6
identifies the same individual as observation 2
.
Now, comparing observation 3
and 6
, we see that these observations share neither id1
nor id2
, but still identifies the same individual, since they both identify the same individual as observation 2
.
I am currently working in Stata and I was wondering what is the best way to go about doing this. I would prefer a Stata based solution, but I would also be interested in seeing R or Python solutions.