2

Let's say I have a table shortText:

ID    | SHORT_TEXT
------+---------------------------
001   | The elephants went in two by two
002   | Needles and haystack
003   | Somewhere over the rainbow
...

How to query shortText to Count the occurence of each word in column SHORT_TEXT (without using a stored procedure) to get a result like

WORD  | OCCURENCE
------+------------
the   | 2
and   | 1
over  | 1
...

Edit:

There is no general answer (variable number of words without a given maximum) provided so far in SO.

Benvorth
  • 7,416
  • 8
  • 49
  • 70
  • what have you tried? where's your sql query or stored procedure? – Mike Tung Sep 28 '17 at 18:43
  • Possible duplicate of [Count the number of occurrences of a string in a VARCHAR field?](https://stackoverflow.com/questions/12344795/count-the-number-of-occurrences-of-a-string-in-a-varchar-field) – Ollie in PGH Sep 28 '17 at 18:46
  • check this answer: https://stackoverflow.com/questions/1094888/counting-word-occurrences-in-a-table-column – slon Sep 28 '17 at 18:49
  • Possible duplicate of [SQL split values to multiple rows](https://stackoverflow.com/questions/17942508/sql-split-values-to-multiple-rows) But instead of separated by comma `,` is separated by spaces. – Juan Carlos Oropeza Sep 28 '17 at 18:50
  • Non of these provide an answer without a Limit for the number of words. – Benvorth Sep 29 '17 at 03:40

3 Answers3

2

I think if you build a full text index on the column, you can get the words from a table that is created to support the tokenization of strings.

this explains pretty well.... https://dev.mysql.com/doc/refman/5.6/en/innodb-ft-index-table-table.html

here's a query once you build the index....

SELECT word, doc_count, doc_id, position FROM INNODB_FT_INDEX_TABLE

OR

SELECT word,count(*) FROM INNODB_FT_INDEX_TABLE group by word;

etc.....

Note, i haven't tested this, but i've done similar in Oracle.

Mark Giaconia
  • 3,844
  • 5
  • 20
  • 42
  • another note on this is that by default MYSQL does not store tokens (words) less than three characters, so if you want ALL words, you will need to change that behavior – Mark Giaconia Sep 29 '17 at 13:55
  • Thanks for that. Unfortunately this only works für InnoDB tables. My index is on a column in a MyISAM table. Wasn't able to figure out where the Index is stored for that type yet... – Benvorth Sep 30 '17 at 13:12
1

In theory, you want to divide up the string "shortText" into individual words (IE split string by spaces), then combine all the arrays into one giant list and count the words.. I'm afraid that in MySQL this may be asking too much, however, I can illustrate the principle in postgreSQL below:

 select word,count(*) occurrence
  from
 (select 
   unnest(string_to_array(lower(short_text),' ')) word 
   from shortText) words
 group by words.word
 order by count(*) desc
Joe Love
  • 5,594
  • 2
  • 20
  • 32
0

I've found this one interesting, count the words in specific column:

SELECT SUM(LENGTH(`YourText`) -  LENGTH(REPLACE(`YourText`, ' ', '' )) +1) FROM `table_name` WHERE `ID`='1';
Mohsen TOA
  • 759
  • 10
  • 17