0

I have a list of domains in a table V_tablas.arreglo(columns--> domainsBad): @hotmai.es @ghotmail.es @hotmaol.com @hotmai.com @otmail.com.....etc(more than 10k) And need to correct this domains to "@hotmail.com" My questions is about EDIT_DISTANCE_SIMILARITY(fuzzy logic) of oracle for get 'Returns an integer Between 0 and 100, Where 0 Indicates no similarity at all and 100 Indicates a perfect match' Is it posible?

2 Answers2

1

SAS has at least a couple functions for calculating edit distance between two strings:

Compged, for general edit distance: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002206133.htm

Complev, for Levenshtein distance: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002206137.htm

Jeff
  • 1,787
  • 9
  • 14
0

You could use a Levenshtein distance algorithm (http://en.wikipedia.org/wiki/Levenshtein_distance) to work out the number of edits to convert the source to the destination strings.

An implementation in SQL is described in this answer:

Levenshtein distance in T-SQL

Community
  • 1
  • 1
DaveRead
  • 3,371
  • 1
  • 21
  • 24