1

I have a table that has some columns containing strings ,Let's say nvarchar. Now, the user passes a string to a function that searches for this string in its assigned column. I want to check if that string is present in the database but the problem is it does not necessarily have to be a 100% match. Let's say for example: The user passed the string Johnathon and string John is present in this database.

So, basically I want to get the number of characters that matched.In this particular case of John and Johnathon. it should be 4 matched and 5 unmatched.

Can I please get some directions to approach this problem?

Edit: What I am guessing is I can do the percentage match thing once I have retrieved the best matching string from the column. So, likewise, if we ignore the number of matched and unmatched characters and focus on retrieving the matched string from database, that should work.

Forexample, as Johnathon was passed by the user, and John is present in the database, I definitely can not use Like operator here but a piece of code that searches for the most matched string in the column and returns it.

Simran
  • 539
  • 2
  • 8
  • 28
  • 1
    Look into SOUNDEX: https://msdn.microsoft.com/en-us/library/ms187384.aspx – juergen d Feb 04 '15 at 09:17
  • @juergend: Thank you for your response but that does not really solve my problem. The string matching results that I shall be getting have to match until some consecutive characters in string and should not be randomly picked up from strings. – Simran Feb 04 '15 at 09:21
  • SQL is not good at string manipulation. But you can write C# code that you can call from SQL. This is done with a [CLR function](https://msdn.microsoft.com/en-us/library/ms189876.aspx) – Andomar Feb 04 '15 at 09:24
  • What you want is called levenshtein edit distance. Take a look at this: http://stackoverflow.com/questions/560709/levenshtein-distance-in-t-sql – DeadlyJesus Feb 04 '15 at 09:29
  • What result do you expect if there is a field, for example, containing 'JohXXX' - Is that 0 match or 3 match? – Hugh Jones Feb 04 '15 at 09:40
  • @HughJones: That's a 3 match – Simran Feb 04 '15 at 09:42
  • DeadlyJesus May have it then, String Length - Levenshtein distance could be the answer. Another example - what value would you expect for a field containing 'JxOxHxN – Hugh Jones Feb 04 '15 at 10:01
  • @HughJones: That would be just 1. Once there was a non matching character, we stop and do not need to look further into the string. – Simran Feb 04 '15 at 10:06
  • Sorry to Nag - what result would you expect from the string 'JxJOHN' ? – Hugh Jones Feb 04 '15 at 10:24
  • @HughJones: That's okay :) . In this particular case, it would be just 1 .We stop at first J. – Simran Feb 04 '15 at 10:45

3 Answers3

1

The Levenshtein distance mentioned by @DeadlyJesus might suit you, but an alternative would be just to count matching characaters from the start of the 2 strings. A simple user defined function could do this.

create function dbo.MatchStart(@input1 nvarchar(100), @input2 nvarchar(100)) returns int as
begin

    declare @i int
    set @i = 1

    if (@input1 is not null and @input2 is not null)
    begin
        while (1 = 1)
        begin
            if (@i > len(@input1) or @i > len(@input2)) 
                break

            if (substring(@input1, @i, 1) <> substring(@input2, @i, 1))
                break;

            set @i = @i + 1
        end
    end

    return @i - 1

end
go

declare @testTable table (text1 nvarchar(100))
declare @userInput nvarchar(100)

insert @testTable values 
    (null),
    (''),
    ('John'),
    ('Johnathan'),
    ('JohXXX'),
    ('Fred'),
    ('JxOxHxN')

set @userInput = 'Johnathan'

select text1, dbo.MatchStart(text1, @userInput) as result from @testTable
Rhys Jones
  • 5,348
  • 1
  • 23
  • 44
  • @HughJones - sorry, edit on it's way, got side tracked by my dad posting on SO (unless you just happen to be someone with the same name as my dad :). Yes, fingers crossed eh! – Rhys Jones Feb 04 '15 at 10:08
  • @Rhys Jones: I don't think this can work in my case, as the second parameter has to be looked into the table column and not to be passed by the user. – Simran Feb 04 '15 at 10:09
1

You can do it this way:

SELECT Name, LEN(Name) AS Equals, (LEN('Johnathon') - LEN(Name)) AS NotEquals
FROM TableName
WHERE 'Johnathon' LIKE '%' +Name +'%'

Or if you want to compare both ways then:

DECLARE @parameter NVARCHAR(MAX) = N'Johnathon'

SELECT Name, 
CASE WHEN LEN(Name) > LEN(@parameter) THEN LEN(@parameter) ELSE LEN(Name) END  AS Equals, 
CASE WHEN LEN(Name) > LEN(@parameter) THEN LEN(Name) - LEN(@parameter) ELSE LEN(@parameter) - LEN(Name) END AS NotEquals
FROM TableName
WHERE Name LIKE '%' + @parameter + '%' OR @parameter LIKE '%' +Name +'%'
Giorgi Nakeuri
  • 35,155
  • 8
  • 47
  • 75
0

You can try this approach:-

IF EXISTS(SELECT * FROM TAB_NAME WHERE COL LIKE '%JOHN%')
SELECT LEN('JOHN') AS MATCHED, (LEN(COL) - LEN('JOHN')) AS UNMATCHED
FROM TAB_NAME;

I think this approach can solve your problem.

Ankit Bajpai
  • 13,128
  • 4
  • 25
  • 40
  • Please check the example, I have mentioned Johnathon as the string passed by user and John to be present in the database. If the Like operator had Johnathon in it, It will never match John. – Simran Feb 04 '15 at 09:40