4

I have a MyISAM table comprising over 2 million records, on which there is a FULLTEXT index over multiple columns.

Given a search term, I would like to know how many times it occurs within the indexed fields of each record.

For example, when searching for 'test' within the following table (in which there is a FULLTEXT index over both the FREETEXT and Third_Col columns):

+----+--------------------------------------------+---------------------------+
| ID | FREETEXT                                   | Third_Col                 |
+----+--------------------------------------------+---------------------------+
|  1 | This is first test string in test example. | This is first test Values |
|  2 | This is second test.                       | This is sec col           |
+----+--------------------------------------------+---------------------------+

I expect results like:

+----+-------+
| ID | count |
+----+-------+
|  1 |     3 |
|  2 |     1 |
+----+-------+

I know that in the FULLTEXT index MySQL uses dtf (the number of times the term appears in the document); how can one obtain this?

eggyal
  • 122,705
  • 18
  • 212
  • 237
Sapan
  • 41
  • 3
  • The dtf is not available in the fulltext index itself, as it has been [combined with other factors](http://dev.mysql.com/doc/internals/en/full-text-search.html) to create a term weight. You would probably find it easier to obtain qf from the relevance score and known term weight per document, but even doing that will have its problems. Perhaps you could provide a little more background behind what you're trying to accomplish? This is increasingly sounding like an incidence of [the XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). – eggyal Dec 16 '12 at 12:27
  • In particular, how static is the data (and indeed, the search term)? What proportion of the 2 million records do you expect may match a search term? What will you do with all the results? Are you looking for a pure SQL solution, or can other tools be used? – eggyal Dec 16 '12 at 12:34
  • I am having a list of important terms (more then 10,000 with there synonyms ex: test,tested ;;not use testimonial) and i want to find there frequency in a database having more then 2million records. The terms can match any number of records. Search term and database is static(will not change). I am looking for fast solution ( i already have constructed my database with multiple FULLTEXT index ). As i can get ID (record identifier) where my term is present in complete database using FULLTEXT index in less then 1min for each term. But i want frequency as well which i will use in later part. – Sapan Dec 16 '12 at 13:18
  • In this ex. [link](http://sqlfiddle.com/#!2/1487b/1/0) `here` in ID 1 it gives result 2 but i want to search the term with boundaries. So how to give boundaries in remove function so that it search/remove "\btest\b" not "test" of "tested". – Sapan Dec 16 '12 at 13:28
  • If both "search term and database are static", then you merely need process each record for the search term once and cache the results. For example, you could use a trigger to do it upon `INSERT` and then store the count and/or stripped data in separate column(s)? – eggyal Dec 16 '12 at 13:53
  • I did not get you. Do i have to create database again?Can you tell me the command? – Sapan Dec 16 '12 at 16:18

1 Answers1

2

Create a user defined function like this

DELIMITER $$

CREATE FUNCTION `getCount`(myStr VARCHAR(1000), myword VARCHAR(100))
    RETURNS INT
    BEGIN
    DECLARE cnt INT DEFAULT 0;
    DECLARE result INT DEFAULT 1;

    WHILE (result > 0) DO
    SET result = INSTR(myStr, myword);
    IF(result > 0) THEN 
        SET cnt = cnt + 1;
        SET myStr = SUBSTRING(myStr, result + LENGTH(myword));
    END IF;
    END WHILE;
    RETURN cnt;    

    END$$

DELIMITER ;

Then you can use this in your query as follows

select id, getCount(concat(FREETEXT, Third_col), 'test') from yourtable

Hope it helps

Akhil
  • 2,602
  • 23
  • 36