0

I have a database that stores search criteria entered by users and want to analyse how often certain words have been used. The "problem" is that many searches have similar meaning but have one or more words that accompany them. Example (in this example "foo" is the interesting word):

bar
foo 2015
show me foo
germany foo

I would like to determine that foo was used three times. I need to do this programmatically that means using SQL commands would be the ideal solution. The words used vary based on user behaviour. Because of this I do not know in advance which words get used, I need the logic to determine this on its own.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Marged
  • 10,577
  • 10
  • 57
  • 99

3 Answers3

1

Expanding on this This Answer (Credit to Aaron Bertrand for Function), you can do this by creating a Split Function and using a Cross Apply to it with a Group By:

CREATE FUNCTION dbo.SplitStrings
(
    @List       NVARCHAR(MAX),
    @Delimiter  NVARCHAR(255)
)
RETURNS TABLE
AS
    RETURN (SELECT Number = ROW_NUMBER() OVER (ORDER BY Number),
        Item FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(@List, Number, 
        CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)))
    FROM (SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id])
        FROM sys.all_objects AS s1 CROSS APPLY sys.all_objects) AS n(Number)
    WHERE Number <= CONVERT(INT, LEN(@List))
        AND SUBSTRING(@Delimiter + @List, Number, 1) = @Delimiter
    ) AS y);
GO 

Sample Data:

Create Table SplitTest
(
    A   Varchar (100)
)

Insert  SplitTest
Values  ('bar'), 
        ('foo 2015'), 
        ('show me foo'), 
        ('germany foo')

Query:

Select      f.Item, Count(*) Count
From        SplitTest                   As s
Cross Apply dbo.SplitStrings(s.A, ' ')  As F
Group By    F.Item
Order By    Count Desc

Results:

Item    Count
foo     3
germany 1
me      1
show    1
2015    1
bar     1
Community
  • 1
  • 1
Siyual
  • 16,415
  • 8
  • 44
  • 58
1

I think this is an idea problem to use the full text search feature of sql server. Solving this problem is what that full text search does.

https://msdn.microsoft.com/en-us/library/ms142571.aspx

To quote from that page:

Full-Text Search Queries

After columns have been added to a full-text index, users and applications can run full-text queries on the text in the columns. These queries can search for any of the following:

  • One or more specific words or phrases (simple term)

  • A word or a phrase where the words begin with specified text (prefix term)

  • Inflectional forms of a specific word (generation term)

  • A word or phrase close to another word or phrase (proximity term)

  • Synonymous forms of a specific word (thesaurus)

  • Words or phrases using weighted values (weighted term)

Why re-invent when the feature already exists in the product?

Hogan
  • 69,564
  • 10
  • 76
  • 117
0

You should read sth about text mining. First of all you should get the list of stop words http://en.wikipedia.org/wiki/Stop_words and load into your sql server, than you need to split all of your phrases into ngram - in your case it will be 1gram what means single words, than you need to check if thease words are on the stop words list and if not count. I'm pretty sure that you will find some tutorials on the web, but i think it's not a right job for sql server - i would suggest to use custom tools or at least ssis with custom coponent\scripts. It will be easier. As someone suggest you can also give a Chance for FullText Search, but it depend if you can turn it on and use.

sdrzymala
  • 387
  • 1
  • 10