Search for a word in the column string and list those words

Question

Have two tables, table 1 with columns W_ID and word. Table 2 with column N_ID and note. Have to list all the NID where words found in table 1 word column contains in Note column (easy part) and also list those words in another column without duplicating the N_ID. Which means using STUFF to concatenate all the words found in Note column for that particular N_ID. I tried using

FULL TEXT INDEX using CONTAIN

But it only allows to search for one word at a time. Any suggestions how I can use a while loop to achieve this.

You don't want to loop here. You need to do a join and then aggregate those results. Of course you are stuck using a wildcard both leading and trailing but hopefully this is just for learning as this type of thing is pretty useless in the real world most of the time. — Sean Lange, Jan 09 '18 at 22:35
Is there a maximum of words from Table 1 you would want to display in the Final Outcome? — Andrew O'Brien, Jan 09 '18 at 23:15
@SeanLange - This is not for learning purpose, it is a real time project for a company I am working for where we should be able to detect the flagged words used in the notes. and that's why we created a separate table with the list of all flagged words. — Excited_to_learn, Jan 10 '18 at 15:15
Ahh gotcha. I would probably use a string splitter and then join to your bad words table. Then reassemble using STUFF. — Sean Lange, Jan 10 '18 at 15:18
The biggest challenge here is deciding what it a word. If you split by space you would eliminate things like "Nah!!". If you simply look for pattern matching you get false positives. — Sean Lange, Jan 10 '18 at 15:54

Andrew O'Brien · Answer 1 · 2018-01-09T23:38:36.053

1

If there is a maximum number of words you want displayed for N_ID, you can pivot this. You could have them in a single column by concatenating them, but I would recommend against that. Here is a pivot that supports up to 4 words per N_ID. You can adjust it as needed. You can view the SQL Fiddle for this here.

SELECT
  n_id,
  [1] AS word_1,
  [2] AS word_2,
  [3] AS word_3,
  [4] AS word_4
FROM (
  SELECT
    n_id,
    word,
    ROW_NUMBER() OVER (PARTITION BY n_id ORDER BY word) AS rn
  FROM tbl2
  JOIN tbl1 ON
    tbl2.note LIKE '%'+tbl1.word+'[ ,.?!]%'
  ) AS source_table
  PIVOT (
    MAX(word)
    FOR rn IN ([1],[2],[3],[4])
  ) AS pivot_table

*updated the join to prevent look for a space or punctuation to declare the end of a word.

edited Jan 09 '18 at 23:38

answered Jan 09 '18 at 23:31

Andrew O'Brien

1,793
1
12
24

There is no limit on a word used in the notes. there can be many. So I am not sure if this will be a right approach in this case. thank you for the help. Much appreciated – Excited_to_learn Jan 10 '18 at 15:11
If there is no limit to the number of words you want to display, you can use dynamic SQL to find the maximum number of words that match then create this query with the number of pivoted columns required. With that said, you may want to consider having an upper limit on the words you'd like to display. As someone reading the report, I wouldn't want to read past the first dozen or so words. – Andrew O'Brien Jan 11 '18 at 00:53

score 1 · Answer 2 · answered Jan 09 '18 at 23:39

1

You can join your tables together based on a postive result from the charindex function.

In SQL 2017 you can run:

SELECT n_id, string_agg(word)
FROM words
inner join notes on 0 < charindex(words.word, notes.note);

Prior to SQL 2017, there is no string_agg so you'll need to use stuff, which is trickier:

select
stuff((
  SELECT ', ' + word
  FROM words
  where 0 < charindex(words.word, notes.note)
  FOR XML PATH('')
  ), 1, 2, '')

from notes;

I used the following schema:

CREATE table WORDS
(W_ID int identity primary key
,word varchar(100)
);

CREATE table notes
(N_ID int identity primary key
,note varchar(1000)
);

insert into words (word) values
('No'),('Nope'),('Nah');

insert into notes (note) values
('I am not going to do this. Nah!!!')
,('It is OK.');

answered Jan 09 '18 at 23:39

Steven Ensslen

1,164
9
21

We don't have SQL 2017, I tried using the sql code with Stuff but it didn't work. It returned just one column with one word repeated many times.Thank you for your help. Much appreciated – Excited_to_learn Jan 10 '18 at 15:12
@Excited_to_learn are you sure? I tested this and it returned what you stated you are looking for correctly for me. – Sean Lange Jan 10 '18 at 15:23
Of course this does return false positives. Notice that this will return 'No' for a string value that does not contain No as word. If there is 'Nothing' in the comment it will return both 'No' and 'Not'. And it will return Nah if the comment is something like 'Hannah was very helpful today'. – Sean Lange Jan 10 '18 at 15:35
I had to tweak the code a bit and it worked but @SeanLange you are correct, it does return false positives. Is there a way while searching for the exact word, we consider the leading and trailing spaces or/and leading comma, dot and exclamation? – Excited_to_learn Jan 10 '18 at 18:17
If you add a leading space it would be impossible if the found word is at the beginning. And exclamation points work with the limited sample data but are not even a reasonable consideration in real data. – Sean Lange Jan 10 '18 at 20:26

Search for a word in the column string and list those words

2 Answers2