1

I need your help.

I try to match a manually created lookup of specific keywords with a fact comment table. Purpose: an attempt to categorize these comments.

Example

  • comment: A lot more power than the equivalent from Audi.
  • keyword from keyword-list: Audi
  • category from keyword-list: competitor

I tried something like

SELECT 
    FC.comment_id, KWM.keyword, KWM.category
FROM 
    dbo.factcomments FC
INNER JOIN 
    (SELECT  
         keywordmatcher = '%[,. ]' + keyword + '[ .,]%',
         keyword,
         category
     FROM 
         dbo.keywordlist) KWM ON FC.comment LIKE KWM.keywordmatcher

Maybe a bad example, but I only want specific matches --> no matches if the keyword is part of another word in the fact comments (e.g. 'part' but not 'apart').

Because my first try didn't match keywords at the beginning/end of strings I did something really nasty:

SELECT 
    FC.comment_id, KWM.keyword, KWM.category
FROM 
    dbo.factcomments FC
INNER JOIN 
    (SELECT  
         keyword,
         category
     FROM 
         dbo.keywordlist) KWM ON FC.comment LIKE '%[,. ]' + KWM.keyword + '[ .,]%' 
                              OR FC.comment LIKE KWM.keyword + '[ .,]%' 
                              OR FC.comment LIKE '%[,. ]' + KWM.keyword

I know...

Besides the fact that I also want to detect those comments where there are '!', '?', ''', '-' or '_' before or after these keywords - is there any clever way to do so?

In fact I want any comments where there are no word characters before or after the keyword, any other character is OK.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
simongr
  • 11
  • 2

2 Answers2

0

In the JOIN condition, REPLACE() all non-alphanumeric characters in FC.Comment with a space character, and surround it with spaces. Something like this:

' '+REPLACE(FC.Comment, ...)+' '

Then do your LIKE Comparison like this:

LIKE '% '+KWM.Keyword+' %'
Tab Alleman
  • 31,483
  • 7
  • 36
  • 52
0

A different approach may be.

declare @comment varchar(255)=concat(' ','A lot more power than the equivalent from Audi.',' ')
declare @keyword varchar(50)='Audi'
DECLARE @allowedStrings VARCHAR(100)
DECLARE @teststring VARCHAR(100)

SET @allowedStrings = '><()!?@_-.\/?!*&^%$#()~'

;WITH CTE AS
(
  SELECT SUBSTRING(@allowedStrings, 1, 1) AS [String], 1 AS [Start], 1 AS [Counter]
  UNION ALL
  SELECT SUBSTRING(@allowedStrings, [Start] + 1, 1) AS [String], [Start] + 1, [Counter] + 1 
  FROM CTE 
  WHERE [Counter] < LEN(@allowedStrings)
)

SELECT @comment = REPLACE(@comment, CTE.[String], '') FROM CTE

Change the @comment variable however you like and check the result

SELECT 
@comment as Comment , @keyword as KeyWord,
iif(substring(@comment,PATINDEX(concat('%',@keyword,'%'),@comment)-1,len(@keyword)+2)=' Audi ',1,0) as isMatch

This is a borrowed idea from https://stackoverflow.com/a/29162400/10735793

MD AZAD HUSSAIN
  • 202
  • 1
  • 6