6

I have 2 columns in a table and I would like to roughly report on the total number of words. Is it possible to run a MySQL query and find out the total number of words down a column.

It would basically be any text separated by a space or multiple space. Doesn't need to be 100% accurate as its just a general guide.

Is this possible?

Lorenz Meyer
  • 19,166
  • 22
  • 75
  • 121
Adam
  • 19,932
  • 36
  • 124
  • 207

4 Answers4

17

Try something like this:

SELECT COUNT(LENGTH(column) - LENGTH(REPLACE(column, ' ', '')) + 1)
FROM table

This will count the number of caracters in your column, and substracts the number of caracters in your column removing all the spaces. Hereby you know how many spaces you have in your row and hereby know how many words there are (roughly because you can also type in a double space, this wil count as two words but you say you want it roughly so this should suffice).

Lorenz Meyer
  • 19,166
  • 22
  • 75
  • 121
Rick Hoving
  • 3,585
  • 3
  • 29
  • 49
  • 2
    It will also count empty strings as 1 word. – ypercubeᵀᴹ Mar 29 '12 at 06:50
  • That's also true, luckily there was a demand for a rough count, else the query would be quite a bit more complex. – Rick Hoving Mar 29 '12 at 06:51
  • This is a good answer for the question asked. But @RickHoving LENGTH(keyword) - LENGTH(REPLACE(keyword, ' ', ''))) +1 does the job as well. I would like to know why you did you use count() on top of it.Thanks! – geekidharsh Nov 16 '17 at 15:38
  • @geekidharsh Without the count it gives you the number of words for each row separately. If you add the count it will give you the number of words for all the words in the entire column. – Rick Hoving Nov 20 '17 at 13:11
16

Count simply gives you the number of found rows. You need to use SUM instead.

SELECT SUM(LENGTH(column) - LENGTH(REPLACE(column, ' ', '')) + 1) FROM table

fikre
  • 161
  • 1
  • 2
1

A less rough count:

SELECT LENGTH(column) - LENGTH(REPLACE(column, SPACE(1), '')) 
FROM
  ( SELECT CONCAT(TRIM(column), SPACE(1)) AS column
    FROM
      ( SELECT REPLACE(column, SPACE(2), SPACE(1)) AS column
        FROM 
          ( SELECT REPLACE(column, SPACE(3), SPACE(1)) AS column
            FROM 
              ( SELECT REPLACE(column, SPACE(5), SPACE(1)) AS column
                FROM 
                  ( SELECT REPLACE(column, SPACE(9), SPACE(1)) AS column
                    FROM 
                      ( SELECT REPLACE(column, SPACE(17), SPACE(1)) AS column
                        FROM 
                          ( SELECT REPLACE(column, SPACE(33), SPACE(1)) AS column
                            FROM tableX
                          ) AS x
                      ) AS x
                  ) AS x
              ) AS x
          ) AS x
      ) AS x
  ) AS x 
ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • 1
    Can you add a sample value to show this is better than selected answer? – tom10271 Jul 20 '17 at 03:56
  • To explain what this is for: This will reduce duplicate spaces, while the above queries will count extra spaces as extra words. So a string like ' i am here' will only appear as 3 words with this while it will appear as a lot more with the other queries. – MikeBergerUS Aug 07 '18 at 01:31
0

I stumbled upon this post while I was looking for an answer myself and truthfully I've tested all of the answers here and the closest one was @fikre's answer. However, I have concern over data that have leading spaces and/or extra spaces between the words (trailing spaces doesn't seem to have effect to fikre's query during my testing). So, I'm looking for a way to identify any spaces in between words and remove them. While I found a few answers using advanced function (which is beyond my skill set), I did find a very simple way to do it.

tl;dr > @fikre's answer is the only one working for me but I did a minor tweak to ensure that I'll get the most accurate word count.

Query 1 -- This will return 5 "Word Count"
SELECT SUM(LENGTH(input) - LENGTH(REPLACE(input, ' ', '')) + 1) AS "Word Count" FROM
(SELECT TRIM(REPLACE(REPLACE(REPLACE(input,' ','<>'),'><',''),'<>',' ')) AS input
FROM (SELECT ' too   late  to the     party ' AS input) i) r;

Query 2 -- This will return 13 "Word Count"
SELECT SUM(LENGTH(input) - LENGTH(REPLACE(input, ' ', '')) + 1) AS "Word Count" 
FROM (SELECT ' too   late  to the     party ' AS input) i;
-- breakdown ' too   late  to the     party '
   1 leading space= 1 word count
   2 spaces after the first space from the word 'too'= 2 word count
   1 space after the first space from the word 'late'= 1 word count
   4 spaces after the first space from the word 'the'= 4 word count
   trailing space(s) wasn't counted at all.
   Total spaces > 1+2+1+4=8 + 5 word count = 13 

So, basically if the data row contains even a million spaces in between (disclaimer: an assumption. I've only tested 336,896 spaces), Query 1 will still return Word count=5.

Note: The mid part REPLACE(REPLACE(REPLACE(input,' ','<>'),'><',''),'<>',' ') I took from this answer https://stackoverflow.com/a/55476224/10910692

FanoFN
  • 6,815
  • 2
  • 13
  • 33