0

I have three different MYSQL queries I am attempting to use to search a single column (classlist); the goal is that the column should contain _10_ but should not contain _3_ or _15_ (an example value might be something like _4_8_10_15_ which would fail given the _15_).

I have tried three different approaches and all three give no errors and yet different results:

1) FULLTEXT searching (with the FULLTEXT index enabled on classlist) returned no errors and yet also returned no results.

SELECT classlist, classcount FROM subgroupstest WHERE MATCH (classlist) AGAINST ('+10 -3 -15' IN BOOLEAN MODE)

2) Simple LIKE clauses returned 516 rows.

SELECT classlist, classcount FROM subgroupstest WHERE classlist LIKE "%_10_%" AND classlist NOT LIKE "%_3_%" AND classlist NOT LIKE "%_15_%"

3) A LIKE/REGEXP hybrid returned 1,912 results. One example it returned that the second approach did not was _4_10_18_23_- this is an accurate result, and there are no duplicates in this result so I have to assume the second approach (no less the first!) is missing something somehow.

SELECT classlist, classcount FROM subgroupstest WHERE classlist LIKE '%_10_%' AND classlist NOT REGEXP '_3_|_15_'

At this point I'm left shaking my head and unsure what is going on. There are always deeper issues I could try to look for, but can someone verify if I'm even doing the REGEXP and/or FULLTEXT approaches correctly? It looks like what I've seen on Google after the last few hours of reading and mimicking, but it's my first time using either of those.

Alex Gold
  • 315
  • 2
  • 10
  • 2
    See: [MySQL LIKE query with underscore](https://stackoverflow.com/questions/22167132/mysql-like-query-with-underscore?rq=1) – Paul Spiegel Jun 07 '19 at 20:12
  • 1
    one comment: fullttext indexes are subject to minimum (and maximum) word and token lengths, as well as a list of excluded words https://dev.mysql.com/doc/refman/8.0/en/fulltext-fine-tuning.html, also the underscore character is a wildcard in LIKE unless it's escaped – spencer7593 Jun 07 '19 at 20:13
  • Ah! That explains that part. I replaced the _ bookends with # now (such as `#10`) and replaced the code appropriately; now approaches 2 and 3 are working. I did realize that the FULLTEXT needed the same things, so I've updated it here (still no results though): `SELECT classlist, classcount FROM subgroupstest WHERE MATCH (classlist) AGAINST ('+#10# -#3# -#15#' IN BOOLEAN MODE)` – Alex Gold Jun 07 '19 at 20:19
  • 1
    For FULLTEXT: Try to set [`ft_min_word_len`](https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_ft_min_word_len) to 1 and use space as delimiter. – Paul Spiegel Jun 07 '19 at 20:27
  • @PaulSpiegel I was able to set ft_min_word_len to 1 (double-checked on PMA's variable page), and then I replaced the underscore (and later hashtag) with just plain spaced. They all work now! I'll update the main post for future reference. Thanks! – Alex Gold Jun 07 '19 at 20:35
  • 1
    @AlexGold don't write the solution in your question. You can post an answer on your question. – Paul Spiegel Jun 07 '19 at 20:37
  • @PaulSpiegel Ah, thanks for the heads up - I moved it accordingly. – Alex Gold Jun 07 '19 at 20:49

1 Answers1

0

As per the comments, I found a solution. First, I made the MYSQL entries plain spaces (4 8 10 15 in the earlier example).

Then the searches work as follow:

SELECT classlist, classcount FROM subgroupstest WHERE MATCH (classlist) AGAINST ('+10 -3 -15' IN BOOLEAN MODE)

SELECT classlist, classcount FROM subgroupstest WHERE classlist LIKE "% 10 %" AND classlist NOT LIKE "% 3 %" AND classlist NOT LIKE "% 15 %"

SELECT classlist, classcount FROM subgroupstest WHERE classlist LIKE '% 10 %' AND classlist NOT REGEXP ' 3 | 15 '

Interestingly, the simple "Like" version was 2-3 times faster than the other two! Fulltext was slightly slower than REGEXP as well.

Alex Gold
  • 315
  • 2
  • 10
  • 1
    this may be obvious (but just in case it's not) `foo LIKE '% 4 %'` will *not* match a foo value of `'4 8 16'` i.e. when the first entry in the list isn't preceded by a space, and the same with the last entry not followed by a space... this might call for some special "guard" characters at the start and end of the values, e.g. `# 4 8 9 #` to make sure the spaces are there. Personally, I'd tend to avoid this kind of "search a list stored in a string" design. But if I had to, I'd use comma separated values `'4,8,12,16'` and use of FIND_IN_SET() – spencer7593 Jun 07 '19 at 21:34
  • https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_find-in-set e.g. with foo = '4,8,10,15` , then ... `WHERE FIND_IN_SET('10',foo) AND NOT FIND_IN_SET('3',foo)` would evaluate to TRUE – spencer7593 Jun 07 '19 at 21:36