1

I get zero results from this query:

SELECT COUNT(1) FROM `myTable` WHERE MATCH(tagline) AGAINST(' +IT professional' IN BOOLEAN MODE)

I get 92 from this:

SELECT COUNT(1) FROM `myTable` WHERE `tagline` LIKE '%IT professional%'

I prefer the first query because MATCH against an index is 80% faster with my setup. Obviously "IT" is a stopword and causing problems. What is the easiest and most expedient way of dealing with this problem if I want the functionality of the second query and the speed of the first one?

Ned Hulton
  • 477
  • 3
  • 12
  • 27

1 Answers1

1

If innodb_ft_min_token_size is the default of 3, then IT is "too short" (as opposed to "a stopword"). Further note that +IT insists on IT, but, since IT is totally ignored, it is never found. Hence the "zero results".

I have had good success with this approach to arbitrary data coming from the user: If the word is 3 or more characters long, prefix it with +, else leave it alone:

... AGAINST('it +professional' IN BOOLEAN MODE)

That has the effect of ignoring "it" and finding all rows with "professional", "professionals", etc.

In some situations (perhaps not yours) this can be useful:

WHERE MATCH(fullname) AGAINST("+Rick +James" IN BOOLEAN MODE)
  AND fullname LIKE "%Rick James%"

That works fast (because the fulltext is quite efficient and the LIKE is performed only against those that match the FT test).

And it avoids

... Rick Smith and James Davis ...

In your case, this might be appropriate:

WHERE MATCH(tagline) AGAINST('IT +professional' IN BOOLEAN MODE)
  AND tagline LIKE '%IT +professional%'

The FT test finds all taglines with "professional" whether "IT" or not; the LIKE further filters.

Your "80% faster" says to me that the table is not very big. The improvement for my tips grows with table size.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Yes, I am combining match with like in the way that you suggested to avoid the full table scan. 80% faster so far, on the small subset of the data that I've indexed so far. Looking good. I am guessing that I don't want to add quote marks because the bigram "information technology" will not be indexed, but those two individual words will be. I just need to make sure each word is preceded with a + symbol. I also need to find the setting in MySQL that deals with stopwords and turn it off. Any guidance on that would be hugely appreciated. – Ned Hulton Apr 01 '22 at 18:26
  • 1
    I don't happen to know the optimal way to deal with a bigram. When searching for details on stopwords, make sure the advice is aimed at InnoDB, not MyISAM; there _are_ differences based on Engine. – Rick James Apr 01 '22 at 20:58
  • Thanks again for all of your help. One more quick question: Can I rebuild the table with this: SET @@SESSION.innodb_ft_enable_stopword = 'OFF'; CREATE TABLE new_table LIKE old_table; – Ned Hulton Apr 01 '22 at 22:11
  • 1
    @NedHulton - Instead of `CREATE`, so `ALTER TABLE table ENGINE=InnoDB;` That has the effect of rebuilding the indexes, but otherwise not really changing anything. – Rick James Apr 01 '22 at 23:42
  • Right, I am adding the data again and starting from scratch with new indexes, I just want to be sure that stopwords are gone forever. I followed the steps here: https://stackoverflow.com/questions/12678920/ignoring-mysql-fulltext-stopwords-in-query. I've edited the my.cnf file I then just need to run this SQL command and I should be good? SET @@SESSION.innodb_ft_enable_stopword = 'OFF'; – Ned Hulton Apr 01 '22 at 23:45
  • 1
    @NedHulton - Correct. The `CREATE ... LIKE ...` gives you an empty table. The `SET` appies _until_ a restart; the change to my.cnf is noticed _afterwards_. (I hope I have not missed some subtle point. But you seem to know the main things.) – Rick James Apr 01 '22 at 23:52
  • Not working... I'm asking a new question. Thanks again. Sincerely appreciated. Determined to defeat this issue... – Ned Hulton Apr 01 '22 at 23:53
  • New question here. https://stackoverflow.com/questions/71713886/mysql-fulltext-index-help-disable-stopwords Help massively appreciated. Driving me nuts. – Ned Hulton Apr 02 '22 at 00:14