0

If I have a table with a MySQL TEXT (or VARCHAR) column with a FULLTEXT index, with the rows possibly containing one or several of any given queried words, and I then run a standard query along the lines of:

SELECT  * 
FROM my_library 
WHERE MATCH (content) 
AGAINST ("cats dogs horse rainbow")

...then, since all of the words to be matched are by definition optional, can I regardless be certain that all rows matching any/several of the query words are returned, with the following exceptions:

  • When a stopword is the only matching term in the query for a given row
  • Any words shorter than minimum indexing length (InnoDB < 3, MyISAM < 4)
  • Any words longer than maximum indexing length (InnoDB > 84)
  • With case mismatch (when using _cs and _bin collations)
  • Any words appearing in more than 50% of results (for MyISAM)
  • If the FULLTEXT index isn't rebuilt after certain configuration changes (reference)
  • When using LIMIT on a table not OPTIMIZEd following row deletion (reference)

I could create test cases, however they would be tedious to analyze, would require massive volume to ensure consistency when searching larger datasets, and would still not be conclusive. (Therefore, no linked SQL fiddle or test data for this question.) I couldn't find a definite statement on this in the reference manual (natural, boolean); which is what I'm looking for.

I have material in need of review, and I need to ensure that all rows matching any queried words are indeed returned for a given query. I assume they are, but I don't have the luxury to assume here, only to have stuff fall through the cracks with quirky behavior. I need to guarantee that no content to be reviewed is omitted from a query expected to match all.

Finally: Whatever the answer may be, does it apply equally to InnoDB and MyISAM; and MySQL and MariaDB (current versions); with natural and boolean mode (without operators)? Are there cases where some potentially matching rows might not be returned?

Markus AO
  • 4,771
  • 2
  • 18
  • 29
  • See last paragraph of the accepted answer to the duplicate question. – Shadow Jan 02 '22 at 11:28
  • @Shadow that would be a MyISAM-specific limitation. I've updated the question with that note, along with a collation-dependent case mismatch possibility. Other than that, the suggested duplicate (from 2012) doesn't provide a thorough and referenced answer, which is what I'm looking for. – Markus AO Jan 02 '22 at 11:41
  • Not really sure what else you are looking for. – Shadow Jan 02 '22 at 13:20
  • As in the OP, I'm primarily looking for _a definitive and thorough statement_ from a _reliable source_ ensuring that _any and all queried words_, with _all the exceptions stated_, will be returned for a fulltext-query; and barring that, any further information on exceptions / edge cases where not all potentially matching rows may be returned. – Markus AO Jan 02 '22 at 17:32
  • The reliable source is the mysql manual, which you seem to have found. – Shadow Jan 02 '22 at 17:43
  • Yes and as noted, there is no definitive statement / listing there. It's even ambiguous if the 50% index saturation limit for MyISAM also applies in boolean mode. Updated OP with further discoveries: There's `innodb_ft_max_token_size` (default 84). Also there are situations where omissions may happen due to configuration changes before index rebuild, or use of `LIMIT` after row deletion but before `OPTIMIZE TABLE`. I'm wondering what all else there is to be discovered that isn't all conclusively stated in one place. Collaboration from the community would be welcome. – Markus AO Jan 02 '22 at 18:10
  • `AGAINST('+Rick +James' IN BOOLEAN MODE)` is like `AND` -- it requires both words to be present. `AGAINST('Rick James')` is like `OR`. See also: `AGAINST('"Rick James"')` – Rick James Jan 03 '22 at 19:33

0 Answers0