0

I want to search a table's varchar column for content in another table's varchar column.

Certain words are banned and I want to identify the rows that have the banned words. I want an EXACT match on the banned word.

I'm using MS SQL Server 2016.

Table 1:

CREATE TABLE [dbo].[BlogComment](
    [BlogCommentId] [int] IDENTITY(1,1) NOT NULL,
    [BlogCommentContent] [varchar](max) NOT NULL,
CONSTRAINT [PK_BlogComment] PRIMARY KEY CLUSTERED 
(
    [BlogCommentId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = 
ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

3 rows - and the data in BlogCommentContent:

There are many of us.

This is the man.

I hear you.

Table 2:

CREATE TABLE [dbo].[BannedWords](
    [BannedWordsId] [int] IDENTITY(1,1) NOT NULL,
    [Description] [varchar](250) NOT NULL
CONSTRAINT [PK_BannedWords] PRIMARY KEY CLUSTERED 
(
    [BannedWordsId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = 
ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

3 rows - and the data in Description:

though

man

hear


My Sql:

SELECT BlogCommentContent
FROM dbo.BlogComment,
     dbo.BannedWords
WHERE ( CHARINDEX( [Description], BlogCommentContent, 1 ) ) > 1

It's finding 'man', 'hear' and 'man' in the word 'many'. So it returns 3 rows.

I only WANT EXACT matches. So only return 2 rows.

How do I accomplish this?

user3020047
  • 868
  • 1
  • 15
  • 45
  • What is your SQL Server version (`SELECT @@VERSION;`)? – Yitzhak Khabinsky May 02 '22 at 17:40
  • Microsoft SQL Server 2016 (SP1-GDR) (KB4019089) - 13.0.4206.0 (X64) Jul 6 2017 07:55:03 Copyright (c) Microsoft Corporation Express Edition (64-bit) on Windows 10 Pro 6.3 (Build 19043: ) – user3020047 May 03 '22 at 14:18
  • From the About menu in Sql server Mgmt studio: Microsoft SQL Server Management Studio 12.0.4100.1 Microsoft Analysis Services Client Tools 12.0.4100.1 Microsoft Data Access Components (MDAC) 10.0.19041.1 Microsoft MSXML 3.0 6.0 Microsoft Internet Explorer 9.11.19041.0 Microsoft .NET Framework 4.0.30319.42000 Operating System 6.3.19043 – user3020047 May 03 '22 at 14:20
  • Did you try the proposed solution? One is for 2017, another will work on 2016 – Yitzhak Khabinsky May 03 '22 at 14:22
  • Yes, I did. It produced the correct results. Thanks! – user3020047 May 03 '22 at 14:42
  • @Yitzhak One more thing. In my expanded version, I return say 3 columns: BannedWord, UserId and id. BannedWord, UserId id man 3 42 hear 4 12 about 4 12 How do I remove a duplicate with regard to 'id'? I only want the 1st 2 rows returned (man and hear). I can't use DISTINCT or GROUP BY. I tried to format this nicely but no good. – user3020047 May 06 '22 at 02:38
  • Please ask a separate question, and provide a minimal reproducible example. (1) DDL and sample data population, i.e. CREATE table(s) plus INSERT, T-SQL statements. (2) What you need to do, i.e. logic, and your attempt implementation of it in T-SQL. (3) Desired output based on the sample data in the #1 above. (4) Your SQL Server version (SELECT @@version;) – Yitzhak Khabinsky May 06 '22 at 03:56
  • @Yitzhak Actually, using your working solution below that solved my initial problem of doing exact matching on banned words. I just want to take it 1 step further to generate only 1 entry when the source entry has more than 1 distinct banned word in it ie the BlogCommentContent. So, OK, I will create a separate question using the working solution you provided. I just thought I was keeping it simple. – user3020047 May 06 '22 at 04:13
  • @Yitzhaf I asked a separate using the solution you provided. https://stackoverflow.com/questions/72136456/removing-duplicates-returned-based-on-the-column-value – user3020047 May 06 '22 at 18:05

2 Answers2

0

If you want exact matches what you mean is that there must not be another work touching so man can match "man", "man and women", "mice and men", "a man or two". You need to check

BlogCommentContent = 'man'
left(BlogCommentContent,3) = 'man'
right(BlogCommentContent,3) = 'man'
BlogCommentContent like ' man ' 

the length of man can be found with len('man') to be used in right() and left().
The last value, for like, can be constructed with concat(' ','man',' ')

  • I tried this but got no results back. SELECT BlogCommentContent FROM dbo.BlogComment, dbo.BannedWords WHERE ( CHARINDEX( [Description], BlogCommentContent, 1 ) > 1 ) AND LEFT(BlogCommentContent,LEN([Description])) = [Description] AND RIGHT(BlogCommentContent,LEN([Description])) = [Description] AND BlogCommentContent LIKE CONCAT(' ',[Description],' ') – user3020047 May 02 '22 at 19:55
  • You need `OR` instead of `AND` –  May 03 '22 at 04:05
  • Ok, changed it. It brings back the 3 rows. – user3020047 May 03 '22 at 14:32
0

Please try the following solution.

It will work starting from SQL Server 2017 onwards.

SQL

-- DDL and sample data population, start
DECLARE @BlogComment TABLE (
    BlogCommentId INT IDENTITY PRIMARY KEY,
    BlogCommentContent VARCHAR(MAX));
INSERT INTO @BlogComment (BlogCommentContent) VALUES
('There are many of us.'),
('This is the man.'),
('I hear you.');

DECLARE @BannedWords TABLE (
    BannedWordsId INT IDENTITY PRIMARY KEY,
    Word varchar(250))
INSERT INTO @BannedWords (Word) VALUES
('though'),
('man'),
('hear');
-- DDL and sample data population, end

;WITH rs AS
(
    SELECT word = TRIM('.,' FROM value ) 
    FROM @BlogComment 
    CROSS APPLY STRING_SPLIT(BlogCommentContent, SPACE(1))
)
SELECT DISTINCT bw.Word 
FROM rs
    INNER JOIN @BannedWords bw ON rs.word = bw.Word;  

SQL Server 2016

;WITH rs AS
(
    --SELECT word = TRIM('.,' FROM [value]) 
    SELECT word = REPLACE(REPLACE([value],'.',''),',','')
    FROM @BlogComment 
    CROSS APPLY STRING_SPLIT(BlogCommentContent, SPACE(1))
)
SELECT DISTINCT bw.Word 
FROM rs
    INNER JOIN @BannedWords bw ON rs.word = bw.Word; 

Output

+------+
| Word |
+------+
| man  |
| hear |
+------+
Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21
  • I get a red squiqgly line on TRIM and it states: 'TRIM' is not a recognized built-in function name. But of course it is. On the following FROM following '.,' it states: Incorrect syntax near 'FROM'. – user3020047 May 02 '22 at 19:47
  • The DDL and inserts work fine. – user3020047 May 02 '22 at 19:48
  • Good idea `string_split`. You get my vote. –  May 02 '22 at 20:00
  • I have MS SQL Server 2014. Guess that could be the problem. So TRIM not a built-in function is correct. Is there a 2014 solution to this task of mine? – user3020047 May 02 '22 at 20:05