0

I have a table with 2 columns ,ID and comment. Is it possible to find the total word count of the comment column of each row? and then find the TOP 10 word counts? I having been trying and failing all afternoon. Any help would be hugely appreciated. If you would like any more info please ask.

radar
  • 13,270
  • 2
  • 25
  • 33
M0rty
  • 985
  • 3
  • 15
  • 35
  • Sorry I had a look at this but that is mySQL i'm using T-SQL sorry I should have specified – M0rty Oct 21 '14 at 02:48
  • yeah... you could use regex to get the count of the words. then look into a splitting stored procedure. most of them will be based on commas, but you can change that to spaces or whatever. i would consider cleaning out punctuation first. – gloomy.penguin Oct 21 '14 at 03:03
  • splitting... something like: http://stackoverflow.com/a/5493616/623952 so you turn the comments into one row for each word in each comment. then group by word and top 10 count(`*`) order by count(`*`) desc – gloomy.penguin Oct 21 '14 at 03:05

2 Answers2

2

If you just need to do this quickly, you could try the query below. Note that it just crudely uses a space for word boundaries.

SELECT TOP(10)(LEN(comment) - LEN(REPLACE(comment, ' ', '')) + 1)
FROM tblComments
ORDER BY (LEN(comment) - LEN(REPLACE(comment, ' ', ''))) DESC

However, please note that this query isn't a particularly efficient solution and I would only use it if it was an ad-hoc problem I was trying to solve. If performance is an issue, such as the query needing to be built into a live transactional application of some sort, I suggest that a better approach would be to use some combination of a third column to store the word count and/or doing the word count in code. Doing so will also provide a better separation of logic and data storage, as well as giving you more flexibility in how words are recognized.

TeamTam
  • 1,598
  • 11
  • 15
1

Try using the DATALENGTH() function in SQLSERVER:

DECLARE  @String VARCHAR(100)
        ,@CharToFind VARCHAR(1)
SET @String = 'AAAA BBBCB NNNNN NEEEEE ERERERERERE  '
SET @CharToFind = ' '
select DATALENGTH(@String)
SELECT CountOfWordsInTheString =  DATALENGTH(@String) - (DATALENGTH(REPLACE(@String,@CharToFind,''))+1)

Your query would look like the below:

SELECT TOP 10 id,DATALENGTH(comment) - ( DATALENGTH(Replace(comment, ' ', '')) + 1 )
FROM   tblname
ORDER  BY DATALENGTH(comment) - ( DATALENGTH(Replace(comment, ' ', '')) + 1 ) DESC 
Shaggy
  • 6,696
  • 2
  • 25
  • 45
knkarthick24
  • 3,106
  • 15
  • 21