2

Suppose we create a table as follows:

create table my_table (
id int, 
city nvarchar(256) collate Chinese_Hong_Kong_Stroke_90_CI_AS)

INSERT INTO my_table (id, city)
VALUES (1, 'Shanghai');

INSERT INTO my_table (id, city)
VALUES (2, 'Shandong');

INSERT INTO my_table (id, city)
VALUES (3, 'Shanxi');

INSERT INTO my_table (id, city)
VALUES (4, 'Shaanxi');

There are now four records in my_table:

id  city
1   Shanghai
2   Shandong
3   Shanxi
4   Shaanxi

The following SQL queries return the same number. How to avoid this error?

select top 1 id from my_table order by DIFFERENCE(city, 'Shanghai') desc
select top 1 id from my_table order by DIFFERENCE(city, 'Shandong') desc

Another problem:

select top 1 id from my_table order by DIFFERENCE(city, 'Shannxi') desc 

Returns 3 when the correct answer should be 4.

Dale K
  • 25,246
  • 15
  • 42
  • 71
Chan
  • 3,605
  • 9
  • 29
  • 60

1 Answers1

1

The issue is caused by the collation of your column. As per the docs SOUNDEX & DIFFERENCE are collation dependent.

A possible solution is:

select top 1 id
from my_table
order by DIFFERENCE(city collate SQL_Latin1_General_CP1_CI_AS, 'Shanghai') desc

select top 1 id
from my_table
order by DIFFERENCE(city collate SQL_Latin1_General_CP1_CI_AS, 'Shandong') desc

I think I would add another column with a SQL_Latin1_General_CP1_CI_AS collation which stores exactly the same value as city.

Although it would be of interest to know exactly what you are trying to accomplish? Because in your current collation those 2 words apparently sound exactly the same.

Its also worth reading Beyond SOUNDEX & DIFFERENCE

Dale K
  • 25,246
  • 15
  • 42
  • 71
  • Thank you, Dale. I need to get the town names of a city name, which may contains error. Your solution is good. However, it cannot solve my new problem as shown above. – Chan Jan 16 '19 at 08:21
  • Shame you didn't include all the data in your original question. I think you're going to have to investigate using the `Damerau–Levenshtein distance` to solve your problem. – Dale K Jan 16 '19 at 08:33
  • 1
    Sorry for not posting all the questions in the beginning. How to start with `Damerau–Levenshtein distance`? – Chan Jan 16 '19 at 08:34