Should I choose the fulltext index or locate() position() function?

Question

I have a problem of my sql performance after I received a slow sql warning for 2 weeks.

Firstly I need to mention that there are so many redundancy in the shop_name column. Because it currently records all the history of customers' transaction. This is what it looks like: "（抖音）澳贝母婴旗舰店21/02,（抖音）澳贝母婴旗舰店21/02,（抖音）澳贝母婴旗舰店21/02,（抖音）澳贝母婴旗舰店21/02" or "广东荔枝红文化传媒有限公司,（抖音）澳贝母婴旗舰店21/02".

Maybe I would split this column into a new relational table. It's in a later discussion.

the size of this querying table is around 300k. Data length is about 300MB and index length is 37MB.

the code below searches through the fullText index while executing while I do the explain. It runs for 4s which is a little unacceptable.

SELECT
    * 
FROM
    customer c 
WHERE
    receiver_mobile_decrypted IS NOT NULL 
    AND c.decrypted_flag = 1 
    AND c.updated_flag = 1 
    AND MATCH ( c.shop_name ) against ("*抖音*" IN boolean MODE ) 
ORDER BY
    create_time DESC 
    LIMIT 1200;

the image below is the explaination of the sql executing. The weird thing here is that I don't find the fulltext initialization in my latest navicat client. But I saw this step cost about 99.455% time of the whole in the production env. In the prod env, a navicat runs in an old version. You know, I cannot screenshot the detail in the prod env. This is what I can provide.

[for sql1] https://i.stack.imgur.com/l25yQ.png [for sql1] https://i.stack.imgur.com/G9aOL.png

the other sql below replacing the "match against" into position() or locate() even like function gets a higher performance. It runs for 1.1s.

SELECT
    * 
FROM
    customer c 
WHERE
    receiver_mobile_decrypted IS NOT NULL 
    AND c.decrypted_flag = 1 
    AND c.updated_flag = 1 
    and position("抖音" in c.shop_name) >0
ORDER BY
    create_time DESC 
    LIMIT 1200;

I would like to provide you the details of this execution. [for sql2]https://i.stack.imgur.com/1rsFZ.png [for sql2]https://i.stack.imgur.com/LFLXd.png

Under current situation, I modified my sql into the 2nd which is acceptable now. But if you can provide me any hint or suggestion, I would appreciate that.

UPDATE

I found some not clarified questions when I post this problem.

I did really set the ngram to this fulltext index. Because if I didn't do so, the querying results would be different from the orginal like"%%" function.

I set the ngram_token_size to 2. like the picture below

https://i.stack.imgur.com/go6ts.jpg https://i.stack.imgur.com/tqBNF.jpg

The question really is: "Does your version of MySQL handle Chinese Fulltext?" Please provide the MySQL version number. — Rick James, Aug 18 '21 at 19:39
and https://www.mysqltutorial.org/mysql-ngram-full-text-parser/ — Rick James, Aug 18 '21 at 19:46
@Rick James Thanks for your help, my mysql version is 8.0.18. And to clarify my problem here. I did set the ngram for this table at the beginning. — Eason Du, Aug 19 '21 at 05:59
Please provide `EXPLAIN SELECT ...` and `EXPLAIN FORMAT=JSON SELECT ...` — Rick James, Aug 19 '21 at 06:13
@Rick James https://imgur.com/e9xWh9s https://imgur.com/9ipmmOM I analyzed this execution and found the fulltext initialization costs almost all the time. But I don't have any approaches to fix it., Appreciate your help. — Eason Du, Aug 19 '21 at 06:42
Does that high percentage happen on all runs? I would expect "initialization" to be a one-time task. Can you provide a small test case? — Rick James, Aug 19 '21 at 21:42
Yup. This is also a point where I felt weird. Almost all the cases has a step of fulltext index initilization. — Eason Du, Aug 20 '21 at 04:16
What's more, if I remove the explain and then do the sql, I get the approximitely 27% cost in initilization and 71.5% cost in executing of the full steps. I run it for a couple of times. — Eason Du, Aug 20 '21 at 04:22
I need to mention this: when I do the same on a test db of version 8.0.24. There's not any fulltext initialization step occuring. — Eason Du, Aug 20 '21 at 04:35

score 0 · Answer 1 · answered Aug 18 '21 at 19:50

0

(Partial answer)

Version 5.7.6 introduced a Chinese (CJK) ngram parser for the FULLTEXT builtin.

Once you have installed, configured, and activated the Ngram parser, MATCH is likely to be much faster than LOCATE() or POSITION() or LIKE.

answered Aug 18 '21 at 19:50

Rick James

135,179
13
127
222

Should I choose the fulltext index or locate() position() function?

UPDATE

1 Answers1