0

I have the following table,

ID  ShopName            LocationName
1   GreatMall           Great,Mall
2   Gingermall          Gingermall
3   MARK.HI             MARK,HI
4   GALLERY INC         GALLERY. INC

If the column 'ShopName' = 'LocationName', then i want to find, by how much percentage the value of the columns match. (or to find the ideal percentage to set the fuzzy matching) For example, in the table where ID = 2, The value in the columns 'ShopName' and 'LocationName' are matching (Gingermall = Gingermall) then the percentage of the data matching is 100%. For ID = 1, the values dont match (GreatMall <> Great,Mall), then I want to know by what percentage the data doesnt match.

Zhorov
  • 28,486
  • 6
  • 27
  • 52
Sam etl
  • 11
  • 2
  • What do you mean by: 'by what percentage the data doesnt match.' Is this based on the number of characters that are different for example. – Peter Smith May 31 '19 at 14:32
  • 1
    I think it's already answered here: [https://stackoverflow.com/questions/8518695/t-sql-get-percentage-of-character-match-of-2-strings](https://stackoverflow.com/questions/8518695/t-sql-get-percentage-of-character-match-of-2-strings) – Eduardo Silva May 31 '19 at 14:35
  • I am looking for percentage of character match of entire 2 columns in a single table – Sam etl May 31 '19 at 15:02
  • Yes, it is based on the number of characters that are different for example – Sam etl May 31 '19 at 15:12
  • What happens if the two strings as `'AAAB'` and `'ABBB'`. Does the ordering matter? So what about `'AAAB'` and `'BBAB'`? Your problem is not well-specified. – Gordon Linoff May 31 '19 at 16:00

1 Answers1

0

I haven't fully vetted this solution but it works for your data sample:

Setup

Create Table #tbl 
(
ID Int,
ShopName VarChar(25),
LocationName VarChar(25)
)
Insert Into #tbl Values
(1,'GreatMall','Great,Mall'), 
(2,'Gingermall','Gingermall'), 
(3,'MARK.HI','MARK,HI'), 
(4,'GALLERY INC','GALLERY. INC')

Query

Declare @results As Table (id Int, ShopName VarChar(25), LocationName VarChar(25), mlength Int, charmatches Int)
Declare @sn VarChar(25)
Declare @ln VarChar(25)
Declare @id Int
Declare @cnts Int 
Declare @cnt Int 
Declare @samechars Int 

Declare vCursor Cursor
    For Select id, ShopName, LocationName From #tbl
Open vCursor
Fetch Next From vCursor Into @id, @sn, @ln

While @@FETCH_STATUS = 0
Begin
   Set @cnts = 1
   Set @cnt = 1
   Set @samechars = 0
   While @cnt <= Case When Len(@sn) > Len(@ln) Then Len(@sn) Else Len(@ln) End
   Begin
      If Substring(@sn,@cnts,1) = Substring(@ln,@cnt,1) 
      Begin
        Set @cnts = @cnts + 1   --Only move this forward if there is a match
        Set @samechars = @samechars + 1
      End
      Else If Substring(@sn,@cnt,1) = Substring(@ln,@cnt,1) Set @samechars = @samechars + 1
      Set @cnt = @cnt + 1
   End

   Insert Into @results Select @id, @sn As ShopName, @ln As LocationName, Case When Len(@sn) > Len(@ln) Then Len(@sn) Else Len(@ln) End As mlength, @samechars As CharMatches 
   Fetch Next From vCursor Into @id, @sn, @ln
End

Select *, (Cast(charmatches As Float)/Cast(mlength As Float)) * 100.0 As percentmatch   From @results

Close vCursor
DeAllocate vCursor
Drop Table #tbl

Results:

id  ShopName    LocationName    mlength charmatches percentmatch
1   GreatMall   Great,Mall      10      9           90
2   Gingermall  Gingermall      10      10          100
3   MARK.HI     MARK,HI         7       6           85.7142857142857
4   GALLERY INC GALLERY. INC    12      11          91.6666666666667
level3looper
  • 1,015
  • 1
  • 7
  • 10