I have have a dataset that contains some twins and triplets. For each set of twins or triplets I need to randomly select one to remain in the dataset. This information is coded in two columns, FamilyID, and FamilyOrder. Twins and triplets share both a FamilyID and FamilyOrder. Non-twin siblings share a FamilyID, but have different FamilyOrder values.
FamilyID FamilyOrder y
1 1 45
1 2 33
2 1 12
3 1 76
3 2 15
3 2 59
3 2 22
4 1 56
4 1 21
So, in this example code, FamilyID 3 contains one non-twin (coded as 1) and a set of triplets (coded as 2), and FamilyID 4 has a pair of twins.
I would like the output to be something like:
FamilyID FamilyOrder y
1 1 45
1 2 33
2 1 12
3 1 76
3 2 22
4 1 56
Which keeps regular siblings, but removes all but one for each twin and triplet.