4

Let's consider User.Note = 'Version:3.7.21.1'

SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7.2*"')

=> returns something

SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7*"')

=> returns nothing

If User.Note = 'Version:3.7.21'

SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7*"')

=> returns something

If User.Note = 'Version:3.72.21'

SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7*"')

=> returns nothing

I can't figure out how it works. It should always returns something when I search for "3.7*".

Do you know what's the logic behind this ?

PS: if I replace the numbers by letters, there's no problem.

CelineR
  • 41
  • 1
  • 4
  • You can use like ‘%3.7.2%´ instead of contains – Sanpas Jan 15 '19 at 19:22
  • 1
    Thank you, but I want to use "CONTAINS" – CelineR Jan 15 '19 at 19:27
  • Are there actual double quotes in your data? – critical_error Jan 15 '19 at 19:30
  • 1
    No but I have to put the double quotes with the * https://learn.microsoft.com/en-us/sql/t-sql/queries/contains-transact-sql?view=sql-server-2017 – CelineR Jan 15 '19 at 19:33
  • If you expect to be handling version numbers frequently then you may want to add either a column with a "normalized" version number that will sort correctly ("0001.0005.0100.0095" so that you don't find that "10" < "2") or separate columns for major, minor, build, edit, ... versions. [This](https://stackoverflow.com/questions/7870945/version-number-sorting-in-sql-server) question is applicable. – HABO Jan 15 '19 at 21:12

2 Answers2

5

I think your problem is being caused by the unpredictability of the word breaker interacting with the punctuation marks within the data. Full text search is based on the concept of strings of characters, not including spaces and punctuation. When the engine is building the index it sees the periods and breaks the word in weird ways.

As an example, I made a small table with the three values you provided...

VALUES (1,'3.7.21.1'),(2,'3.7.21'),(3,'3.72.21')

Now when I do your selects, I get results on all four... not the results I expect, though.

For me, this returns all three values

SELECT * FROM containstext WHERE CONTAINS(secondid, '"3.7.2*"')

and this returns only 3.7.21

SELECT * FROM containstext WHERE CONTAINS(secondid, '"3.7*"')

So let's run this and take a look at the contents of the full text index

SELECT * FROM sys.dm_fts_index_keywords(db_id('{databasename}'), object_id('{tablename}'))

For my results (yours are quite probably different) I've got the following display_term values

display_term     document_count
    21              3
    3               3
    3.7.21          1
    7               2
    72              1

So let's look at the first search criterion '"3.7.2*"' If I shove that into sys.dm_fts_parser...

select * from sys.dm_fts_parser('"3.7.2*"', 1033, NULL, 0)

...it's showing me that it's breaking with matches on

3
7
2

But if I do...

select * from sys.dm_fts_parser('"3.7*"', 1033, NULL, 0)

I'm getting a single exact match on the term 3.7 and sys.dm_fts_index_keywords told me earlier that I only have one document/row that contains 3.7

You might also experience additional weirdness because numbers 0-9 are usually in the system stopwords and can be left out of an index because they're considered to be useless. This might be why it works when you change to letters.

Also, I know you've decided to replace LIKE, but Microsoft has suggested that you only use alphanumeric characters in your full text indexes and, if you need to use non-alphanumeric characters in search criteria, you should use LIKE. Perhaps changing the periods to some alphanumeric replacement that won't be used in normal values?

GreyOrGray
  • 1,575
  • 8
  • 14
  • 1
    Thank you a lot for your explanation and suggestions ! The search value "3.7" is an example but we can have multiple search values like `SELECT * FROM [USER] WHERE CONTAINS(Note, '"3.7* AND "vers*"" AND "test*"')` I think changing the periods to some alphanumeric characters can be a workaround solution. I just tried replacing "." by "%2e%" in my data so my query is `SELECT * FROM [USER] WHERE CONTAINS(Note, '"3%2e%7*"')`, the result is returned. – CelineR Jan 16 '19 at 15:46
-1

Contains will only work if the column is in a full text index. If it it is not indexed you will need to use like:

SELECT * FROM [USER] WHERE NOTE like '3.7%' --or '%3.7%

Are you wanting to use CONTAINS because you think it will be faster?(It generally is)

The Microsoft document lists all the ways you can format and use CONTAINS(11 examples)

Here is the Microsoft doc on CONTAINS

Nick A
  • 126
  • 7
  • Yes, we've decided to replace LIKE by CONTAINS for performance improvements. The column is in a full text index. The operator works well for other search values but not for this specific case... The question is : why "CONTAINS" does not return the same result according to the data (cf. my examples)? – CelineR Jan 15 '19 at 20:48
  • @CelineR Next time, maybe include those things in your original post -> "The column is in a full text index. The operator works well for other search values but not for this specific case" – Nick A Jan 15 '19 at 22:06
  • 1
    The initial post does say that some of the searches return results while others don't. That should be enough to intuit that the full text index is there. – GreyOrGray Jan 15 '19 at 22:11