6

I want to return the 10 most common words from a query in SQL server, so running against a set of rows such as this:

quick brown fox
slow yellow fox
slow green fox

would return
fox
slow
quick
brown
yellow
green

Nick Haslam
  • 1,512
  • 1
  • 24
  • 47

3 Answers3

3

I'd try running a split function (separating each word through a space) over each returned row in order to get all separate words into an auxiliar table. Using the following code, you should be able to split a row by its spaces:

CREATE FUNCTION dbo.Split(@String varchar(8000), @Delimiter char(1))     
returns @temptable TABLE (items varchar(8000))     
as     
begin     
    declare @idx int     
    declare @slice varchar(8000)     

    select @idx = 1     
        if len(@String)<1 or @String is null  return     

    while @idx!= 0     
    begin     
        set @idx = charindex(@Delimiter,@String)     
        if @idx!=0     
            set @slice = left(@String,@idx - 1)     
        else     
            set @slice = @String     

        if(len(@slice)>0)
            insert into @temptable(Items) values(@slice)     

        set @String = right(@String,len(@String) - @idx)     
        if len(@String) = 0 break     
    end 
return     
end

You should call this function from within a cursor or something; inside it, simply use something like:

insert into #tmp (word) select * from dbo.split(' ', @row)

Finally, you would only have to use a simple query like:

select top 10 count(*) as number, word from separated_words_table order by number

Source here

João Pereira
  • 3,545
  • 7
  • 44
  • 53
1

To see how to do this declaratively (i.e. without a while loop), see an answer I worked on (for a code golf of all things): Build an ASCII chart of the most commonly used words in a given text

Note that the code in that link is intended to have the fewest characters possible, not to be readable. Please use more descriptive names, at least.

Community
  • 1
  • 1
Gabe
  • 84,912
  • 12
  • 139
  • 238
1

Another way. ("Borrowed" from here)

WITH Sentences AS
(
SELECT 'quick brown fox' AS Sentence UNION ALL
SELECT 'slow yellow fox' UNION ALL
SELECT 'slow green fox'
),
Xmlified AS
(
    SELECT
       CAST('<M>' + REPLACE(Sentence,' ','</M><M>') + '</M>' AS XML)  AS xSentence
FROM Sentences
),
Words AS
(
SELECT
 Split.a.value('.', 'VARCHAR(100)') AS word
FROM Xmlified
CROSS APPLY xSentence.nodes('/M') Split(a)
)
SELECT COUNT(*) AS C, word FROM Words
GROUP BY word
ORDER BY C DESC
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • This is cool but it choked on ampersands for me. I solved it by first replacing them in the sentence like `CAST('' + REPLACE(REPLACE(Sentence,'&','amp;'),' ','') + '' AS XML) AS xSentence` – shanabus May 03 '17 at 19:44
  • 1
    @shanabus if you are on 2016 use this instead https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql – Martin Smith May 03 '17 at 19:50