0

I'm using SQL Server 2008 R2, and using CONTAINSTABLE ISABOUT() to identify key phrases in document titles. However, one specific key phrase is yielding false positives. They keyphrase is: "p-h-c industry" (without the quotes). Here's some sample code that illustrates the problem:-

CREATE TABLE tmp_test (myidentifier nvarchar(30) NOT NULL, mytext varchar(max));  
CREATE UNIQUE CLUSTERED INDEX test_key ON tmp_test(myidentifier);  
CREATE FULLTEXT CATALOG tmp_test_cat;  
CREATE FULLTEXT INDEX ON tmp_test(mytext) KEY INDEX test_key ON tmp_test_cat;  
GO   

INSERT tmp_test VALUES ('good case','Western p-h-c industry');
INSERT tmp_test VALUES ('bad case','competitive advertising industry');
GO  

SELECT * FROM CONTAINSTABLE (tmp_test, mytext, 'ISABOUT ("p-h-c industry" WEIGHT (.8))') ORDER BY RANK DESC;  
go

KEY                            RANK
------------------------------ -----------
Informational: The full-text search condition contained noise word(s).
good case                      15
bad case                       15

(2 row(s) affected)

I would expect the SELECT * FROM CONTAINSTABLE() to only return a row for the 'good case' since there is in an exact match. But it's also returning a row for the 'bad case' even though the keyphase "p-h-c industry" does not appear in the bad case text.

Can anyone suggest a fix or workaround? I have to do it using CONTAINSTABLE() since the code architecture is built around that. Thanks

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
andyabel
  • 335
  • 4
  • 15
  • 1
    Full-text considers single characters as stopwords. Stopwords are automatically removed from the index when that index is created. e.g. the phrase “an apple and an orange,” “an” and “and” are considered stopwords and are not be included in the index. Only “apple” and “orange” are tokenized and added to the index. See [this](https://learn.microsoft.com/en-us/sql/relational-databases/search/configure-and-manage-stopwords-and-stoplists-for-full-text-search) and [this](https://www.red-gate.com/simple-talk/sql/learn-sql-server/understanding-full-text-indexing-in-sql-server/) – user2321864 Aug 06 '17 at 13:42
  • Thanks. So you're suggesting that the hyphen is a word breaker? I think you're right. And it looks like the process for disabling a word breaker is a little tricky - not something I'd like to try on a production system! Thanks for your help! – andyabel Aug 07 '17 at 15:34
  • Looks like hyphen is a word breaker, but I can't find any documentation to suggest that a single character is considered a stopword. And in fact, if I test the phrase using sys.dm_fts_parser I get:- select display_term from sys.dm_fts_parser('"p-h-c industry"', 1033, 0, 0); display_term ---------------- p h c phc industry (5 row(s) affected) – andyabel Aug 07 '17 at 16:26

0 Answers0