I have two columns with ~20k rows of names (not all unique) that I want to compare row-by-row between the two columns. I also would like to compare length and get a % difference in length to LV distance so I can start grouping names based on how closely matched each row is.
Example of subset data:
df <- data.frame(R_Number = c(1:10), A = c('Microsoft', 'Microsoft Corporation', 'Microsoft Corp', 'Microsoft inc', 'Microsoft', 'Microsoft INC', 'Microsoft CORP', 'MSFt', 'Microsoft inc', 'Microsoft'), B = c('Microsoft', 'MSFT', 'MSFT Corp', 'Apple inc', 'Microsoft', 'Microsoft INC', 'Microsoft corp', 'Microsoft', 'AMZN', 'Amazon'))
Example of stringdist function to calculate diff between col rows:
test_2 <- sapply(dist.methods, function(lv) stringdist(df$A, df$B, method=lv))
I get an output table but I am having trouble visualizing the this and getting a new field/table that shows the LV distance per row which shows it's corresponding name.
Desired output:
A | B | LV_DIST
MSFT Microsoft 8