7

I have a table with a varchar(255) field. I want to get (via a query, function, or SP) the number of occurences of each word in a group of rows from this table.

If there are 2 rows with these fields:

"I like to eat bananas"
"I don't like to eat like a monkey"

I want to get

    word | count()
    ---------------
    like  3
    eat   2
    to    2
    i     2
    a     1

Any idea? I am using MySQL 5.2.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Elad Meidar
  • 186
  • 1
  • 5
  • Possible duplicate of [Count the frequency of each word](http://stackoverflow.com/questions/22410458/count-the-frequency-of-each-word). The present question is older, but a solution has been accepted in the other question. – RandomSeed May 05 '14 at 16:41

5 Answers5

5

@Elad Meidar, I like your question and I found a solution:

SELECT SUM(total_count) as total, value
FROM (

SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') as value
FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) value
  FROM table_name t CROSS JOIN 
(
   SELECT a.N + b.N * 10 + 1 n
     FROM 
    (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
   ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
    ORDER BY n
) n
 WHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))
 ORDER BY value

) AS x
GROUP BY x.value

) AS y
GROUP BY value

Here is the full working fiddle: http://sqlfiddle.com/#!2/17481a/1

First we do a query to extract all words as explained here by @peterm(follow his instructions if you want to customize the total number of words processed). Then we convert that into a sub-query and then we COUNT and GROUP BY the value of each word, and then make another query on top of that to GROUP BY not grouped words cases where accompanied signs might be present. ie: hello = hello! with a REPLACE

Community
  • 1
  • 1
Gabriel Rodriguez
  • 1,163
  • 10
  • 23
1

I would recommend not to do this in SQL at all. You're loading DB with something that it isn't best at. Selecting a group of rows and doing frequency calculation on the application side will be easier to implement, will work faster and will be maintained with less issues/headaches.

Alex N.
  • 14,805
  • 10
  • 46
  • 54
  • Really, isn't it smarter to run it as a stored procedure? or a function?. i always thought that offloading such calculations to the DB is better. – Elad Meidar Jul 07 '09 at 22:09
  • 1
    Agreed, there is no way of doing this using regular SQL. You could write a stored procedure to do it using a temporary table and a modified version of the one of the techniques in this article: http://www.sommarskog.se/arrays-in-sql-2000.html (this is for SQL Server 2000 but could be adapted to MySQL's dialect fairly easily) However, the issue still is that you're going outside SQL's "comfort zone" in doing so – Ken Keenan Jul 07 '09 at 22:18
  • You certainly can do that in SP, using multiple passes over data set and extracting each word with regexp, then putting results in temp table and iterating over it and so on. But it is quite complicated, it will be both difficult to write and maintain, and I would said maintenance is even more important then writing it in the first place. DBs are good for what they designed for - storing/reading/writing relational data. Making them compute things are just suboptimal design and will eventually return as a lot of PITA. – Alex N. Jul 08 '09 at 12:50
1

You can try this perverted-a-little way:

SELECT 
(LENGTH(field) - LENGTH(REPLACE(field, 'word', ''))) / LENGTH('word') AS `count`
ORDER BY `count` DESC

This query can be very slow. Also, it looks pretty ugly.

Slava Popov
  • 142
  • 1
  • 2
0

I think you should do it like indexing, with additional table. Whenever u create, update, or delete a row in your original table, you should update your indexing table. That indexing table should have the columns: word, and the number of occurrences.

  • Yeah, but the problem is i need in in a scope of a user (FK on the table with the string i need to parse) using a join table between a word, a phrase and a user seems a bit of over kill no? – Elad Meidar Jul 09 '09 at 01:42
-1

I think you are trying to do too much with SQL if all the words are in one field of each row. I recommend to do any text processing/counting with your application after you grab the text fields from the db.

Troggy
  • 644
  • 2
  • 9
  • 23
  • Really, isn't it smarter to run it as a stored procedure? or a function?. i always thought that offloading such calculations to the DB is better. – Elad Meidar Jul 07 '09 at 22:09