0

Say I have a table called PHRASES containing some text strings

+--+---------------+
|ID|PHRASE         |
+--+---------------+
|0 |"HELLO BYE YES"|
+--+---------------+
|1 |"NO WHY NOT"   |
+--+---------------+
|2 |"NO YES"       |
+--+---------------+

And I want to add the number of times each of the following words occur to the OCCURRENCE column, let's call this table KEYWORDS:

+--------+----------+
|KEYWORD |OCCURRENCE|
+--------+----------+
|"YES"   |NULL      |
+--------+----------+
|"NO"    |NULL      |
+--------+----------+
|"HELLO" |NULL      |
+--------+----------+
|"CHEESE"|NULL     |
+--------+---------+

I now want to write a query that would update KEYWORDS to the following:

+--------+----------+
|KEYWORD |OCCURRENCE|
+--------+----------+
|"YES"   |2         |
+--------+----------+
|"NO"    |2         |
+--------+----------+
|"HELLO" |1         |
+--------+----------+
|"CHEESE"|0         |
+--------+----------+

Note that I have already got a function called dbo.RegExIsMatch that can take care of string matches, such that it returns 1 if parameter 1 matches against the string in parameter 2:

UPDATE KEYWORDS SET OCCURRENCE = 
(
    SELECT SUM
    (
            -- the following returns 1 if the keyword exists in the phrase, or 0 otherwise
        CASE WHEN dbo.RegExIsMatch('.*' + KEYWORDS.KEYWORD + '.*',PHRASES.PHRASE,1) = 1 THEN 1 ELSE 0 END
    )
    FROM PHRASES
    CROSS JOIN KEYWORDS
)

This doesn't work though, it just ends up filling each row with the same number. I'm sure this is a simple problem I'm just struggling to get my head around SQL-think.

ErikE
  • 48,881
  • 23
  • 151
  • 196
quant
  • 21,507
  • 32
  • 115
  • 211

4 Answers4

0

Your query has three different tables, but the question only two. Is this what you mean?

UPDATE Keywords
    SET OCCURRENCE = (SELECT SUM(CASE WHEN dbo.RegExIsMatch('.*' + KEYWORDS.KEYWORD + '.*',PHRASES.PHRASE,1) = 1
                                     THEN 1 ELSE 0
                                END)
                     FROM PHRASES
                    );

Otherwise, if you have three tables, you need to correlate the subquery with the outer table.

ErikE
  • 48,881
  • 23
  • 151
  • 196
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Yeah that was a mistake, I've tried to simplify the original code for this question but missed that one.. – quant Jan 17 '14 at 03:45
  • You could make this a little simpler by moving the condition to the WHERE clause and counting instead of summing it: `SET OCCURRENCE = (SELECT Count(*) FROM Phrases WHERE dbo.RegExIsMatch( ...) = 1)` – ErikE Jan 28 '14 at 03:35
0

Well this seems to work

MERGE INTO KEYWORDS masterList
USING (
    SELECT COUNT(*) AS OCCURRENCE,KEYWORDS.KEYWORD AS KEYWORD FROM
    KEYWORDS AS keywordList
    CROSS JOIN PHRASES AS phraseList 
    WHERE (dbo.RegExIsMatch('.*' + keywordList.KEYWORD + '.*',phraseList.PHRASE,1) = 1)
    GROUP BY KEYWORD
) frequencyList
ON (masterList.KEYWORD = frequencyList.KEYWORD)
WHEN MATCHED THEN 
    UPDATE SET masterList.OCCURRENCE = frequencyList.OCCURRENCE;
ErikE
  • 48,881
  • 23
  • 151
  • 196
quant
  • 21,507
  • 32
  • 115
  • 211
0

As I do not have your function dbo.RegExIsMatch to test against I came up with this slightly different example using only sqlserver-out-of-the-box-stuff.

You probably did get the count of 1 everywhere, because you are using SUM without GROUP BY.

Please note that this is not 100% accurate since I am not using regex but just "simple stupid" string functions, but if you were to modify your regex-function to do a regex-replace you could replace my calls to REPLACE with that and that would give you the correct result.

fiddle demo

The other minor change was to set initial values of 0 instead of NULL for all keywords.

Also note that I am not doing a CROSS JOIN anymore but a join against phrases that contain the word, that way the occurrences do not get overwritten multiple times, that is also what is happening in your case I suppose.

INSERT INTO KEYWORDS (KEYWORD, OCCURRENCE)
  SELECT 'YES', 0
  UNION
  SELECT 'NO', 0
  UNION
  SELECT 'HELLO', 0
  UNION
  SELECT 'CHEESE', 0;

UPDATE KEYWORDS SET KEYWORDS.OCCURRENCE = KEYWORDS.OCCURRENCE +
    (LEN(PHRASES.PHRASE) - LEN(REPLACE(PHRASES.PHRASE, KEYWORDS.KEYWORD, ''))) / LEN(KEYWORDS.KEYWORD)
  FROM KEYWORDS
  INNER JOIN PHRASES ON CHARINDEX(KEYWORDS.KEYWORD, PHRASES.PHRASE) > 0;

PS: for that simple stupid string counting I used slightly modified code from this answer(including the comment)

Community
  • 1
  • 1
DrCopyPaste
  • 4,023
  • 1
  • 22
  • 57
  • I think he doesn't need to count the number of times the word appears within each phrase (that is, "YES YES YES" would count only once, not 3 times) so that a simple `LIKE` expression would do the job. – ErikE Jan 28 '14 at 03:38
-1

try in this method its working from my side

-------------Table Creation

    declare @PHRASE table (ID int,PHRASE varchar(max))
    insert into @PHRASE 
    select 0,'"Hello Bye Yes"'
    union all
    select 1,'"No Why Not"'
    union all
    select 2,'"No Yes"'
    select * from @PHRASE
    declare @Keywords table (KEYWORD varchar(10),OCCURANCE int)
    insert into @Keywords 
    select 'YES',null
    union all
    select 'NO',null
    union all
    select 'HELLO',null
    union all
    select 'CHEESE',null
    select * from @Keywords

----------Script for requirement

create table #table (name varchar(max),)

DECLARE @str VARCHAR(25)

DECLARE curs_Fp CURSOR FOR

SELECT  c.PHRASE FROM @PHRASE c 

OPEN curs_Fp
FETCH NEXT FROM curs_Fp INTO  @str

    WHILE @@FETCH_STATUS = 0 
BEGIN

      while patindex('%["]%',@str) > 0
         SET @str = REPLACE( @str, SUBSTRING( @str, patindex('%["]%',@str), 1 ),'')

                            set @str = @str+' '
                            WHILE CHARINDEX(' ', @str) > 0 
                            BEGIN

                                DECLARE @tmpstr VARCHAR(50)
                                 SET @tmpstr = SUBSTRING(@str, 1, ( CHARINDEX(' ', @str) - 1 ))

                                insert into #table (name) select @tmpstr

                                SET @str = SUBSTRING(@str, CHARINDEX(' ', @str) + 1, LEN(@str))
                            END

FETCH NEXT FROM curs_Fp INTO  @str
END

CLOSE curs_Fp
DEALLOCATE curs_Fp

update y
set y.OCCURANCE = isnull(x.occurance,0)
from
@Keywords y
left join
--#table x on y.keyword = x.name
(select a.name,count(a.name) occurance from #table a group by a.name) x on y.KEYWORD  = x.name
select * from @Keywords
drop table #table
Kishore
  • 846
  • 6
  • 9