Hi I have been trying for a while to match two large columns of names, several have different spellings etc... so far I have written some code to practice on a smaller dataset
examples%>% mutate(new_ID = case_when(mapply (adist, example_1 , example_2) <= 3 ~ example_1, TRUE ~ example_2))
This manages to create a new column with names the name from example 1 if it is less than an edit distance of 3 away. However, it does not give the name from example 2 if it does not meet this criteria which I need it to do.
This code also only works on the adjacent row of each column, whereas, I need it to work on a dataset which has two columns (one is larger- so cant be put in the same order).
Also needs to not try to match the NAs from the smaller column of names (there to fill it out to equal length to the other one).
Anyone know how to do something like this?
dput(head(examples))
structure(list(. = structure(c(4L, 3L, 2L, 1L, 5L), .Label = c("grarryfieldsred","harroldfrankknight", "sandramaymeres", "sheilaovensnew", "terrifrank"), class = "factor"), example_2 = structure(c(4L, 2L, 3L, 1L,
5L), .Label = c(" grarryfieldsred", "candramymars", "haroldfranrinight",
"sheilowansknew", "terryfrenk"), class = "factor")), row.names = c(NA,
5L), class = "data.frame")