I am currently using the 'agrep' function with 'lapply' in a data.table code to link entries from a user-provided VIN# list to a DMV VIN# database. Please see the following two links for all data/code so far:
Accelerate performance and speed of string match in R
Imperfect string match using data.table in R
Is there a way to extract the "best" match from my list that is being generated by:
dt <- dt[lapply(car.vins, function(x) agrep(x,vin.vins, max.distance=c(cost=2, all=2), value=T)), list(NumTimesFound=.N), vin.names]
because as of now, the 'agrep' function gives me multiple matches, even with a lot of modification of the cost, all, substitution, ect. variables.
I have also tried using the 'adist' function instead of 'agrip' but because 'adist' does not have an option for value=TRUE like 'agrep', it throws out the same
Error in `[.data.table`(dt, lapply(vin.vins, function(x) agrep(x,car.vins, :
x.'vin.vins' is a character column being joined to i.'V1' which is type 'integer'.
Character columns must join to factor or character columns.
that I was receiving with the 'agrep' before.
Is there perhaps some other package I could use?
Thanks!