I have a table with 2 columns ,ID and comment. Is it possible to find the total word count of the comment column of each row? and then find the TOP 10 word counts? I having been trying and failing all afternoon. Any help would be hugely appreciated. If you would like any more info please ask.
-
Sorry I had a look at this but that is mySQL i'm using T-SQL sorry I should have specified – M0rty Oct 21 '14 at 02:48
-
yeah... you could use regex to get the count of the words. then look into a splitting stored procedure. most of them will be based on commas, but you can change that to spaces or whatever. i would consider cleaning out punctuation first. – gloomy.penguin Oct 21 '14 at 03:03
-
splitting... something like: http://stackoverflow.com/a/5493616/623952 so you turn the comments into one row for each word in each comment. then group by word and top 10 count(`*`) order by count(`*`) desc – gloomy.penguin Oct 21 '14 at 03:05
2 Answers
If you just need to do this quickly, you could try the query below. Note that it just crudely uses a space for word boundaries.
SELECT TOP(10)(LEN(comment) - LEN(REPLACE(comment, ' ', '')) + 1)
FROM tblComments
ORDER BY (LEN(comment) - LEN(REPLACE(comment, ' ', ''))) DESC
However, please note that this query isn't a particularly efficient solution and I would only use it if it was an ad-hoc problem I was trying to solve. If performance is an issue, such as the query needing to be built into a live transactional application of some sort, I suggest that a better approach would be to use some combination of a third column to store the word count and/or doing the word count in code. Doing so will also provide a better separation of logic and data storage, as well as giving you more flexibility in how words are recognized.

- 1,598
- 11
- 15
Try using the DATALENGTH()
function in SQLSERVER:
DECLARE @String VARCHAR(100)
,@CharToFind VARCHAR(1)
SET @String = 'AAAA BBBCB NNNNN NEEEEE ERERERERERE '
SET @CharToFind = ' '
select DATALENGTH(@String)
SELECT CountOfWordsInTheString = DATALENGTH(@String) - (DATALENGTH(REPLACE(@String,@CharToFind,''))+1)
Your query would look like the below:
SELECT TOP 10 id,DATALENGTH(comment) - ( DATALENGTH(Replace(comment, ' ', '')) + 1 )
FROM tblname
ORDER BY DATALENGTH(comment) - ( DATALENGTH(Replace(comment, ' ', '')) + 1 ) DESC

- 6,696
- 2
- 25
- 45

- 3,106
- 15
- 21