-2

I'm trying to remove a specific character (hyphen / dash) from a mysql match against query since searching for an actual dash throws an error. Here is what I have but I'm getting an error that using match+replace is not proper syntax

SELECT *
FROM table
WHERE
MATCH (replace(search_text,'-','')) 
AGAINST ('+5552400*' IN BOOLEAN MODE)

The column 'search_text' has a value with 555-2400 that I'm trying to match on. The search must include a wildcard.

search_text can contain anything from 4 to 500 characters. It may be phone numbers, email addresses, mailing addresses, first names, last names and much more.

This is for an auto-complete search and the result of 555-2400 should show up if I type any of the following:

5
55
555
555-
555-2
555-24
555-240
555-2400
Will B.
  • 17,883
  • 4
  • 67
  • 69
Cary
  • 278
  • 1
  • 3
  • 17
  • Please provide a sample data set with your desired output. A bit confused as to what you are expecting to happen, but looks like you're wanting `AGAINST ('"555-2400"' IN BOOLEAN MODE)`which would not exclude `2400` as it would without quotes. Since using `REPLACE(search_text)` removes any benefits from the FTS indexing, I think you would be better off using `search_text LIKE '%555-2400%'` instead. – Will B. Apr 05 '20 at 23:21
  • I want it to be wildcard. This is for an auto-complete search. So if a person starts typing 555- it should show 555-2400 as a potential match. – Cary Apr 06 '20 at 02:17
  • If I switch to search_text like '%555-2400%', the query becomes significantly slower. The table has a couple million records. At some point, a search server would probably be best. – Cary Apr 06 '20 at 02:26
  • As I said, using either `LIKE` or `REPLACE(table.search_text, '-', '')` will cause a full-table scan, as the database is not able to guess at the values to match against, before retrieving them from each of the rows. Meaning there is no index utilization that `MATCH` uses to be so much faster than `LIKE`. If you can give us a small data-set to work with, against your desired results, we can try to help more. As there are alternatives by using a combination of FTS filters. We just need to know what that data looks like and what you want it to output, to provide you with a correct answer. – Will B. Apr 06 '20 at 02:43
  • For auto-complete, are you allowing stuff before the '555'? If not, the `LIKE` without a leading wild card will be much better. Please elaborate on what types of numbers are involved -- that is phone numbers? part numbers? something else? – Rick James Apr 06 '20 at 05:55
  • The dataset is anything from 4 characters to 500 characters. It may be phone numbers, email addresses, mailing addresses, first names, last names and much more. – Cary Apr 06 '20 at 15:00
  • 1
    @Cary how are you expecting to handle email address searches, since `@` is also not a valid word character for `FULLTEXT` indexes and represents a distance operator in BOOLEAN MODE? This smells like a key-value store table or a schema-less structure. See my answer for an explanation on the limitations of `FULLTEXT` and 4 approaches to handling the phone number searches. But I strongly urge you to seriously consider refactoring your table schema, to move away from the accepts any value concept, or move that data to a NOSQL database like MongoDB, that can better support what you're wanting. – Will B. Apr 06 '20 at 16:45
  • @fyrye I believe I will use a NOSQL db at some point. Right now, I'm doing the strategy of changing it to '555-2400 ' which seems to work ok for the time being. I'm in an AWS environment so I can use one of their solutions for the storage. – Cary Apr 06 '20 at 20:10
  • @Cary Check out my "generated column approach". It can handle all of the boolean operators that you need, Effectively letting you search the escaped data, and display the unecsaped data in a similar manner, without needing to parse the column output or edit the column with the added text. If you decide to stick with their solution, you should delete your answer and close your question as duplicate; to better direct people and give credit to the author of the solution that worked for you. – Will B. Apr 06 '20 at 20:24
  • 1
    For others the referenced answer from [How to allow fulltext searching with hyphens in the search query](https://stackoverflow.com/a/41925824/1144627) that assisted the OP – Will B. Apr 06 '20 at 20:32
  • I'm trying to figure out how to close the question and point to that answer but I don't know how to do that. I can give you the bounty if you put that in it's own answer. I didn't want to accept your answer because it has too many options in it and I wanted to point people to the exact one that helped me. – Cary Apr 06 '20 at 20:33
  • @Cary updated my answer to reflect the "accepted solution", I would like to leave the other approaches as they are all valid for different needs and not very well detailed in the other question for this specific use-case. The main issue was not knowing the data in your table or the full scope of your desired results *(only phone numbers, or phone numbers and emails, etc)*, to produce a singular answer from, since there are a LOT of limitations with full-text searches. – Will B. Apr 06 '20 at 20:44
  • @Cary As for closing, I would update your question with the link I posted in my comment as an alternative reference. Since there is an active bounty, you will have to wait until after the bounty has expired (unfortunately you will not be refunded the 50 points), and then it can be closed as a duplicate. – Will B. Apr 06 '20 at 20:49

3 Answers3

2

You should not manipulate the column values in the MATCH() clause of your query, as it will result in a full-table scan, defeating the purpose of a FULLTEXT index. This is because MySQL needs to retrieve the data from each row to determine the resulting function value.

Additionally modifying the MATCH clause is not permitted with INNODB storage tables, as the column list supplied must match exactly what is in the FULLTEXT index.

Generated Columns DB-Fiddle

One approach is utilizing the Generated Columns feature of MySQL, to replace the offending boolean operator characters and utilize a separate FULLTEXT index on the generated column.

The generated column will allow your original data to remain unmodified, for use in other unrelated full-text searches and permit using the AGAINST('+5552400*') clause without the boolean operator conflicts. The secondary column will also help reduce false positives and lower the index size that can occur by adding additional text to your source column.

The generated column will cause INSERT and UPDATE operations to be impacted slightly. As each row will cause an additional operation to add the second column values automatically for you.

Desired Result

| id  | search_text           |
| --- | --------------------- |
| 1   | called 555-2400 ext 4 |
| 2   | called 555-2400ext 4  |

Schema

CREATE TABLE table_name (
  `id` INTEGER,
  `search_text` VARCHAR(21),
  FULLTEXT idx (search_text)
);

INSERT INTO table_name
  (`id`, `search_text`)
VALUES
  ('1', 'called 555-2400 ext 4'),
  ('2', 'called 555-2400ext 4'),
  ('3', 'called 555-2432 ext 1'),
  ('4', 'called 555-2432ext 1'),
  ('5', 'called 444-2400 ext 2'),
  ('6', 'called 444-2432 ext 2');

Add Generated Column

ALTER TABLE table_name
ADD COLUMN search_text_parsed TEXT 
    GENERATED ALWAYS AS (REPLACE(search_text, '-', '')) STORED,
ADD FULLTEXT INDEX `idx2` (`search_text_parsed`);

You can add more boolean operators to remove as desired, by changing the generated column expression, (REPLACE(REPLACE(search_text, '-', ''), '@', ''))

Search Query

SELECT *
FROM table_name
WHERE MATCH(search_text_parsed)
      AGAINST('+5552400*' IN BOOLEAN MODE);

Result

| id  | search_text           | search_text_parsed   |
| --- | --------------------- | -------------------- |
| 1   | called 555-2400 ext 4 | called 5552400 ext 4 |
| 2   | called 555-2400ext 4  | called 5552400ext 4  |

Additionally you can automate the term value validation directly in your query, to ensure the text does not contain any of the offending boolean operators by using
REPLACE(search_text, '-', '')

SET @term='555-2400';

SELECT *
FROM table_name
WHERE MATCH(search_text_parsed)
      AGAINST(CONCAT('+', REPLACE(@term, '-', ''), '*') IN BOOLEAN MODE);

Other approaches

As the question has been asked in a different manner, here is the question that closely resembles your underlying issue: How to allow fulltext searching with hyphens in the search query

Will B.
  • 17,883
  • 4
  • 67
  • 69
1

Try changing your SQL-query to match an exact phrase (using double-quotes), this allows you to include the dash in your query.

SELECT *
FROM table
WHERE
MATCH (search_text) 
AGAINST ('+"555-2400"' IN BOOLEAN MODE)

Notice that you will lose the asterisk (*) at the end this way which may be a problem for you. This means you will no longer get results that merely begins with "555-2400" but only those that match exactly.

See more information here: https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html

  • Definitely need the wildcard as this search is for an auto-complete search functionality. Searching for '555-24' should also yield the 555-2400 result. – Cary Apr 06 '20 at 02:20
0

Based on how you describe your problem, perhaps you are better off using a normal index (not a fulltext index)

ALTER TABLE table_name ADD INDEX (search_text);

Then you can do efficient "starts-with" queries using LIKE '<search-term>%' clauses.

E.g.

SELECT *
FROM table_name
WHERE search_text LIKE '555%'

More information here: https://dev.mysql.com/doc/refman/8.0/en/index-btree-hash.html