0

In my mssql database I have a table containing articles(id, name, content) a table containing keywords(id, name) and a link table between articles and keywords ArticleKeywords(articleId, keywordID, count). Count is the number of occurrences of that keyword in the article.

How can I write a SP that gets a list of comma separated strings and gives me the articles that have this keywords ordered by the number of occurrences of the keywords in the article? If an article contains more keywords I want to sum the occurrences of each keyword.

Thanks, Radu

Radu D
  • 3,505
  • 9
  • 42
  • 69
  • Hi, your question is clear to me except for the phrase "... that gets a list of comma separated strings". Where does it get it from? Hand-typed or from some other database or textfile? – littlegreen Nov 06 '10 at 09:48

3 Answers3

1

Although it isn't completely clear to me what the source of your comma-separated string is, I think what you want is an SP that takes a string as input and produces the desired result:

CREATE PROC KeywordArticleSearch(@KeywordString NVARCHAR(MAX)) AS BEGIN...

The first step is to verticalize the comma-separated string into a table with the values in rows. This is a problem that has been extensively treated in this question and another question, so just look there and choose one of the options. Whichever way you choose, store the results in a table variable or temp table.

DECLARE @KeywordTable TABLE (Keyword NVARCHAR(128))
-- or alternatively...
CREATE TABLE #KeywordTable (Keyword NVARCHAR(128))

For lookup speed, it is even better to store the KeywordID instead so your query only has to find matching ID's:

DECLARE @KeywordIDTable TABLE (KeywordID INT)
INSERT INTO @KeywordTable 
    SELECT K.KeywordID FROM SplitFunctionResult S 
    -- INNER JOIN: keywords that are nonexistent are omitted
    INNER JOIN Keywords K ON S.Keyword = K.Keyword

Next, you can go about writing your query. This would be something like:

SELECT articleId, SUM(count)
FROM ArticleKeywords AK
WHERE K.KeywordID IN (SELECT KeywordID FROM @KeywordIDTable)
GROUP BY articleID

Or instead of the WHERE you could use an INNER JOIN. I don't think the query plan would be much different.

Community
  • 1
  • 1
littlegreen
  • 7,290
  • 9
  • 45
  • 51
  • A quick question about MSSQL: How can I get a index seek when I have an inner join on Name = '%' + name + '%'? – Radu D Nov 07 '10 at 22:29
  • Erm. Unless I'm overlooking something, the only way to do that is to store the new string in a temp table: `INSERT INTO #temp SELECT '%' + name + '%' AS derivedname` and define an index on it: `CREATE INDEX index01 ON #temp (derivedname)`. Then use `INNER JOIN #temp`. But are you sure you didn't mean `LIKE '%' + name + '%'`? Optimizing that would be a different story... – littlegreen Nov 08 '10 at 00:20
  • My join has the condition: Name LIKE '%' + name + '%'. Can I optimize that? The strange is that I get an index seek in the execution plan ... but I think it does a table scan. How is it possible to do an index seek on something like '%' + name + '%' ? – Radu D Nov 08 '10 at 11:55
  • I don't know, sorry. But it would be an awesome new question! It also seems someone has asked it before: http://stackoverflow.com/questions/1569002/how-can-i-optimize-refactor-a-tsql-like-clause – littlegreen Nov 08 '10 at 12:36
1

For the sake or argument lets say you want to look-up all articles containg the keywords Foo, Bar and Shazam.

ALTER PROCEDURE spArticlesFromKeywordList

@KeyWords varchar(1000) = 'Foo,Bar,Shazam'
AS

SET NOCOUNT ON

DECLARE @KeyWordInClause varchar(1000)
SET @KeyWordInClause = REPLACE (@KeyWords ,',',''',''')

EXEC(
'
SELECT 
t1.Name as ArticleName,
t2.Name as KeyWordName,
t3.Count as [COUNT]
FROM ArticleKeywords t3
INNER JOIN Articles t1 on t3.ArticleId = t1.Id
INNER JOIN Keywords t2 on t3.KeywordId = t2.Id
WHERE t2.KeyWord in ( ''' + @KeyWordInClause  + ''') 
ORDER BY 
3 descending, 1
'
)

SET NOCOUNT OFF
  • This is good as well. Seeing he has about a million articles though, it´s better to first convert the keywords into id´s and use that in the IN clause. – littlegreen Nov 08 '10 at 00:13
0

I think I understand what you are after so here goes ,(not sure what lang you are using but) in PHP (from your description) I would query ArticleKeywords using a ORDER BY count DESC statement (i.e. the highest comes first) - Obviously you can "select by keywordID or articleid. In very simple terms (cos that's me - simple & there may be much better people than me) you can return the array but create a string from it a bit like this:

$arraytostring .= $row->keywordID.',';

If you left join the tables you could create something like this:

$arraytostring .= $row->keywordID.'-'.$row->name.' '.$row->content.',';

Or you could catch the array as

$array[] = $row->keywordID;

and create your string outside the loop.

Note: you have 2 fields called "name" one in articles and one in keywords it would be easier to rename one of them to avoid any conflicts (that is assuming they are not the same content) i.e. articles name = title and keywords name= keyword

rmap
  • 67
  • 5
  • I want to do that is a stored procedure ... because there are almost a million articles ... I cannot load them all. I someone can give me the start of the sp how to parse the csv keywords and add their ids to a temp table it will be enough – Radu D Nov 06 '10 at 09:52