I am wanting to use MetaPhone, Double Metaphone, Caverphone, MetaPhone3, SoundEx, and if anyone has done it yet NameX functions within 'R' so I can categorize and summarize like values to minimize data cleansing operations prior to analysis.
I am fully aware that each algorithm has its own strengths and weakness and would highly prefer not to use SoundEx but it still might work if I cannot find alternatives; as like mentioned in this post Harper would match with any of a list of unrelated names under SoundEx but should not in Metaphone for better result matching.
Though I am not sure which would serve my purposes best while still preserving some flexibility so that is the reason I want to take a stab with several of them as well as before looking at the values generate a table like the following.
Surnames are not the subject of my initial analysis but think it is a good example as I want to effectively consider all like 'sounding' words treated as the same value is really what I am trying to do with a simply call something as values are evaluated.
Some things I have already looked at:
- I know that a C package could be written and called with RCpp, and there are even C solutions for SoundEx on SE, but I have not written an R package before and looking to avoid re-inventing the wheel if there is a simpler way to do it directly in R or a package exists that has the function available?
- I am aware that the RecordLinkage and now stringdist package have a SoundEx function, but not any form of a MetaPhone function.
So I am specifically looking for an answer is to how do a MetaPhone / Caverphone function in R and know the "Value" so I can group data values by them?
The additional caveat is I am still consider my self pretty new to R as I am not a daily user of it.