I have a list of domains in a table V_tablas.arreglo(columns--> domainsBad):
@hotmai.es
@ghotmail.es
@hotmaol.com
@hotmai.com
@otmail.com.....etc(more than 10k)
And need to correct this domains to "@hotmail.com"
My questions is about EDIT_DISTANCE_SIMILARITY(fuzzy logic) of oracle for get 'Returns an integer Between 0 and 100, Where 0 Indicates no similarity at all and 100 Indicates a perfect match' Is it posible?
Asked
Active
Viewed 947 times
0
2 Answers
1
SAS has at least a couple functions for calculating edit distance between two strings:
Compged, for general edit distance: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002206133.htm
Complev, for Levenshtein distance: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002206137.htm

Jeff
- 1,787
- 9
- 14
-
Don't forget `SPEDIS()` as well. – Robert Penridge Nov 18 '14 at 19:11
0
You could use a Levenshtein distance algorithm (http://en.wikipedia.org/wiki/Levenshtein_distance) to work out the number of edits to convert the source to the destination strings.
An implementation in SQL is described in this answer: