I'd like to do what I think is a very simple operation -- adding a column with a number for each person to a dataset with a list of (potentially) duplicative names. I think that I am close. This code looks at a dataset of names, does pairwise comparisons, and appends a column whether there is a likely match. Now I just want to go one step further -- instead of dropping duplicates, I want to come up with a unique identifier.
Peter
Example:
Peter
Peter
Peter
Connor
Matt
would become
Example:
Peter -- 1
Peter -- 1
Peter -- 1
Connor -- 2
Matt -- 3
library(RecordLinkage)
data(RLdata10000)
rpairs <- compare.dedup(RLdata10000, blockfld = 5)
p=epiWeights(rpairs)
classify <- epiClassify(p,0.7)
summary(classify)
match <- classify$prediction
results <- cbind(classify$pairs,match)