2

I have near about 200 words. I want to see how many times those words occurred in a column of a table.

e.g: say we have table test with column statements which has two rows.

  1. How are you. It's been long since I met you.
  2. I am fine how are you.

Now I want to find the the occurrences of words "you" and "how". Output should be something like:

word          count
you            3
how            2

since "you" has 3 and how has 2 occurrences in the two rows.

How can I do this?

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
Ashish
  • 189
  • 2
  • 9

4 Answers4

0

Here is a solution:

SELECT SUM(total_count) as total, value
FROM (

SELECT count(*) AS total_count, REPLACE(REPLACE(REPLACE(x.value,'?',''),'.',''),'!','') as value
FROM (
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(t.sentence, ' ', n.n), ' ', -1) value
  FROM table_name t CROSS JOIN 
(
   SELECT a.N + b.N * 10 + 1 n
     FROM 
    (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
   ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
    ORDER BY n
) n
 WHERE n.n <= 1 + (LENGTH(t.sentence) - LENGTH(REPLACE(t.sentence, ' ', '')))
 ORDER BY value

) AS x
GROUP BY x.value

) AS y
GROUP BY value

Here is the full working fiddle: http://sqlfiddle.com/#!2/17481a/1

First we do a query to extract all words as explained here by @peterm(follow his instructions if you want to customize the total number of words processed). Then we convert that into a sub-query and then we COUNT and GROUP BY the value of each word, and then make another query on top of that to GROUP BY not grouped words cases where accompanied signs might be present. ie: hello = hello! with a REPLACE

Community
  • 1
  • 1
Gabriel Rodriguez
  • 1,163
  • 10
  • 23
0

You can do it like this:

  1. Split the phrase and put all items in a different table;
  2. Remove all ponctuation;
  3. Make a select using the created table and the words that you want to identify.
aF.
  • 64,980
  • 43
  • 135
  • 198
  • are you suggesting to do this using php? I don't want to make use of php. I want to do this only in php. Is it possible? If you can do this is mysql only, please can you give an example. – Ashish Dec 21 '11 at 17:18
  • 1
    This is impossible with only MySQL as it doesn't have string splitting functions. But it's probably possible to write your own function. Something like this: http://blog.fedecarg.com/2009/02/22/mysql-split-string-function/ – Minras Dec 21 '11 at 17:28
  • 1
    @Minras: That's simply not true. Using a loop and [SUBSTRING_INDEX()](http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_substring-index) can explode a string on a delimiter. It would certainly be *easier* in a language like Perl or PHP, but you can do just about anything purely in MySQL. – Conspicuous Compiler Dec 21 '11 at 17:43
0

The way I would approach this is to write a little user defined function to give me the number of times one string appears in another with some allowances for:

  • upper and lower case
  • common punctuation

I would then create a table with all of the words that I wish to search with i.e. your 200 list. Then use the function to count the number of occurrences of each word in every phrase, put that in a inline view and then sum the results up by search word.

Hence:

User Defined Function

DELIMITER $$

CREATE FUNCTION `get_word_count`(phrase VARCHAR(500),word VARCHAR(255), delimiter VARCHAR(1)) RETURNS int(11)
READS SQL DATA
BEGIN
 DECLARE cur_position INT DEFAULT 1 ; 
 DECLARE remainder TEXT;
 DECLARE cur_string VARCHAR(255);
 DECLARE delimiter_length TINYINT UNSIGNED;
 DECLARE total INT;
 DECLARE result DOUBLE DEFAULT 0;
 DECLARE string2 VARCHAR(255);

 SET remainder = replace(phrase,'!',' ');
 SET remainder = replace(remainder,'.',' ');
 SET remainder = replace(remainder,',',' ');
 SET remainder = replace(remainder,'?',' ');
 SET remainder = replace(remainder,':',' ');
 SET remainder = replace(remainder,'(',' ');

 SET remainder = lower(remainder);

 SET string2 = concat(delimiter,trim(word),delimiter);
 SET delimiter_length = CHAR_LENGTH(delimiter);
 SET cur_position = 1;

 WHILE CHAR_LENGTH(remainder) > 0 AND cur_position > 0 DO
    SET cur_position = INSTR(remainder, delimiter);
    IF cur_position = 0 THEN
        SET cur_string = remainder;
    ELSE
        SET cur_string = concat(delimiter,LEFT(remainder, cur_position - 1),delimiter);
    END IF;
    IF TRIM(cur_string) != '' THEN
        set result = result + (select instr(string2,cur_string) > 0);
    END IF;
    SET remainder = SUBSTRING(remainder, cur_position + delimiter_length);
 END WHILE;

 RETURN result;
END$$

DELIMITER ;

You might have to play with this function a little depending on what allowances you need to make for punctuation and case. Hopefully you get the idea here though!

Populate tables

create table search_word
(id int unsigned primary key auto_increment,
 word varchar(250) not null
);

insert into search_word (word) values ('you');
insert into search_word (word) values ('how');
insert into search_word (word) values ('to');
insert into search_word (word) values ('too');
insert into search_word (word) values ('the');
insert into search_word (word) values ('and');
insert into search_word (word) values ('world');
insert into search_word (word) values ('hello');

create table phrase_to_search
(id int unsigned primary key auto_increment,
phrase varchar(500) not null
);

insert into phrase_to_search (phrase) values ("How are you. It's been long since I met you");
insert into phrase_to_search (phrase) values ("I am fine how are you?");
insert into phrase_to_search (phrase) values ("Oh. Not bad. All is ok with the world, I think");
insert into phrase_to_search (phrase) values ("I think so too!");
insert into phrase_to_search (phrase) values ("You know what? I think so too!");

Run Query

select word,sum(word_count) as total_word_count
from
(
select phrase,word,get_word_count(phrase,word," ") as word_count
from search_word
join phrase_to_search
) t
group by word
order by total_word_count desc;
Tom Mac
  • 9,693
  • 3
  • 25
  • 35
-1

Below is the simple solution for the case when you need to count certain word occurrences, not the complete statistics:

SELECT COUNT(*) FROM `words` WHERE `row1` LIKE '%how%';
SELECT COUNT(*) FROM `words` WHERE `row1` LIKE '%you%';
Minras
  • 4,136
  • 4
  • 18
  • 18
  • thanks for reply, but this thing is very basic which I already know. Likewise if I have 200 words to search, I will have to run 200 queries, which I don't want. – Ashish Dec 21 '11 at 17:13
  • @Minras: Looking at the example given above, you can see that your solution gives an incorrect answer. It returns 1 per row that has the searched-for word, not 1 per occurrence of word. Your second query gives 2, but it *should* give 3. – Conspicuous Compiler Dec 21 '11 at 17:44