0

I have a requirement where I am searching for a phone number in a column of a table. The length of the phone number happens to be longer than the ones present in the table.

In such a case how do I do a best match to get the column value which best coincides with my phone number?

The Impaler
  • 45,731
  • 9
  • 39
  • 76
  • Why is the type of your column CHAR or VARCHAR? – Stefan Collier Aug 02 '18 at 12:19
  • 2
    Edit the question add some sample data & desired result would helpful. – Yogesh Sharma Aug 02 '18 at 12:20
  • Which portion of the phone number is expected to be stored in the table? The begin, the end, something in between? – sticky bit Aug 02 '18 at 12:23
  • Example! I don't really get the problem. – The Impaler Aug 02 '18 at 15:14
  • Define `bestmatch`. Are you looking for fuzzy/partial matching? There's no such functionality in SQL Server. You can calculate the Edit distance between the stored numbers and the target, ie the number of digits that have to be added/removed/moved to get a match and find the one with the lowest distance. The [Levenshtein algorithm](https://www.red-gate.com/simple-talk/blogs/string-comparisons-in-sql-edit-distance-and-the-levenshtein-algorithm/) is a common way to do that, but it will be slow – Panagiotis Kanavos Aug 02 '18 at 15:26
  • One way to speed things up would be to calculate various permutations of the phone number on the *client* and search the database for them. For a 10 digit phone number, you'd generate 10 possible 9-digit phones and 110 11-digit phones. – Panagiotis Kanavos Aug 02 '18 at 15:29

3 Answers3

1

You could use a charindex to search if the value of column phone is present in your searchvalue

select * 
from   MyTable t
where  charindex(t.phone, '0123456789') > 0

this will find any rows where the contents of the column phone is inside '012345789'

another option is to use the containsfunction from full text search.
See here: https://learn.microsoft.com/en-us/sql/t-sql/queries/contains-transact-sql?view=sql-server-2017

GuidoG
  • 11,359
  • 6
  • 44
  • 79
1

You can implement a function like Levenshtein distance:

Distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other

There are a lot of implementation which differs in term of performance, see other answers and article about this method

For example :

SELECT LEVENSHTEIN('0134227897', num) as distance , num
FROM   (VALUES ('0134752267'),
               ('+336709057'),
               ('+3347578974') ) AS t(num) 
order by distance

Result will be :

Distance   num  
5           +3347578974   
5           0134752267   
8           +336709057 

So the best matchs are the two nums with only 5 of distance

Kobi
  • 2,494
  • 15
  • 30
  • This should be a commen. Yes, that's one option but that can't be used even to *start* implementing an answer. [This one](https://stackoverflow.com/questions/560709/levenshtein-distance-in-t-sql) would be more appropriate as a *full* answer, or a SQLCLR function etc. – Panagiotis Kanavos Aug 02 '18 at 15:23
  • Or the full article [String Comparisons in SQL: Edit Distance and the Levenshtein algorithm](https://www.red-gate.com/simple-talk/blogs/string-comparisons-in-sql-edit-distance-and-the-levenshtein-algorithm/) – Panagiotis Kanavos Aug 02 '18 at 15:25
0

If you wanted to find phone numbers that are longer than the others. You could find phone numbers that are longer than the average phone number like so:

SELECT phone_number
FROM my_tbl
WHERE LEN(phone_number) > (SELECT AVG(LEN(phone_number)) FROM my_tbl)

LEN finds the length of a VARCHAR field

AVG Is an aggregate function (a function that reduces a list down to a single value) that takes calculates the average of a list/column


P.s. I'm not certain what you mean by a 'best match' to get the value.

Stefan Collier
  • 4,314
  • 2
  • 23
  • 33