I'm trying to identify 'unique' and 'near unique' cases or records from a dataset for a disclosure control project. Particularly combinations on certain variables that only appear once, twice etc.
The records appear in:
table(Age,Sex,Ethnicity)
I am most interested in the elements (which are TRUE) of:
table(Age,Sex,Ethnicity)==1
table(Age,Sex,Ethnicity)==2
I know there are roughly 150 cases I am looking for from:
sum(table(Age,Sex,Ethnicity)==1)
there is also an identifier in the dataset which be a nice output or a number from 1:length(Age)*length(Sex)*length(Ethnicity) would be just as good. I was hoping it return a list something like:
[1] 103 207 218....
[41] * * *
[81] * * *
where 'identifier' = 103, 207 and 218 for the first 3 of 150 cases where:
table(Age,Sex,Ethnicity)==1
I was naively hoping that something like:
data$identifier[table(age,sex,mar,emp,edu) == 1]
names(table(Age,Sex,Ethnicity)
would work but no such luck. I've also looked into unique() but that returns every combination (that occurs once or more). Any help or input would be much appreciated.
Added reproducible example (hopefully) Example
set.seed(1234)
a <- 1+rpois(100,1)
b <- 1+rpois(100,1)
c <- 1+rpois(100,1)
a[a >= 5] <- 4
b[b >= 5] <- 4
c[c >= 5] <- 4
eg <- cbind(1:100,a,b,c)
(sum(table(a,b,c)==1))
should have 12 'unique' combinations, which I would like to identify using the first column of eg (or the identifier from the dataset)