My data includes a Name column. Some names are written in upto eight different ways. I tried grouping them with the following code:
groups <- list()
i <- 1
while(length(x) > 0)
{
id <- agrep(x[1], x, ignore.case = TRUE, max.distance = 0.1)
groups[[i]] <- x[id]
x <- x[-id]
i <- i + 1
}
head(groups)
groups
Next, I want to add a new column that returns the, for example, most commonly used notation of a name for each row. The result should look like:
A B
1. John Snow John Snow
2. Peter Wright Peter Wright
3. john snow John Snow
4. John snow John Snow
5. Peter wright Peter Wright
6. J. Snow John Snow
7. John Snow John Snow
etc.
How can I get there?