0

I'm using MySQL 8 and I've got a table called records for which I added the following index in order to perform text search on it:

CREATE FULLTEXT INDEX all_records_idx ON records (`title`, `label`, `description`, `catalog_number`,`barcode`);

Now, I've got the following data (copied it as JSON):

[
  {
    "barcode": "79027066021",
    "title": "White Blood Cells",
    "slug": "White-blood-cells",
    "image": null,
    "description": "rev",
    "artist_id": 3,
    "genre_id": 1,
    "num_of_records": 1,
    "original_release_year": 2001,
    "original_release_date": null,
    "reissue_release_date": null,
    "label": null,
    "sleeve_side_color": "ff1500",
    "vinyl_color": "ff1500",
    "size": 12,
    "barcode_in_record": null,
    "catalog_number": null,
    "owned_copies": 1,
    "active": 1,
    "created_at": "2023-06-11 21:32:17.000",
    "updated_at": null,
    "deleted_at": null
  },
  {
    "barcode": "889854036119",
    "title": "Nilsson Schmilsson",
    "slug": "nilsson-schmilsson",
    "image": null,
    "description": null,
    "artist_id": 6,
    "genre_id": 1,
    "num_of_records": 1,
    "original_release_year": 1971,
    "original_release_date": null,
    "reissue_release_date": null,
    "label": null,
    "sleeve_side_color": "490d57",
    "vinyl_color": "000000",
    "size": 12,
    "barcode_in_record": null,
    "catalog_number": null,
    "owned_copies": 1,
    "active": 1,
    "created_at": "2023-06-11 21:32:17.000",
    "updated_at": null,
    "deleted_at": null
  },
  {
    "barcode": "C10777774644617",
    "title": "Abbey Road",
    "slug": "abbey-road",
    "image": null,
    "description": null,
    "artist_id": 2,
    "genre_id": 1,
    "num_of_records": 1,
    "original_release_year": 1969,
    "original_release_date": null,
    "reissue_release_date": null,
    "label": null,
    "sleeve_side_color": "46518f",
    "vinyl_color": "000000",
    "size": 12,
    "barcode_in_record": null,
    "catalog_number": null,
    "owned_copies": 1,
    "active": 1,
    "created_at": "2023-06-11 21:32:17.000",
    "updated_at": null,
    "deleted_at": null
  },
  {
    "barcode": "C10777774644618",
    "title": "Revolver",
    "slug": "revolver",
    "image": null,
    "description": null,
    "artist_id": 2,
    "genre_id": 1,
    "num_of_records": 1,
    "original_release_year": 1966,
    "original_release_date": null,
    "reissue_release_date": null,
    "label": null,
    "sleeve_side_color": "46518f",
    "vinyl_color": "000000",
    "size": 12,
    "barcode_in_record": null,
    "catalog_number": null,
    "owned_copies": 1,
    "active": 1,
    "created_at": "2023-06-11 21:32:17.000",
    "updated_at": null,
    "deleted_at": null
  },
  {
    "barcode": "C10777774644619",
    "title": "arevolat",
    "slug": "aurevolat",
    "image": null,
    "description": "this is a test",
    "artist_id": 2,
    "genre_id": 1,
    "num_of_records": 1,
    "original_release_year": 1999,
    "original_release_date": null,
    "reissue_release_date": null,
    "label": null,
    "sleeve_side_color": null,
    "vinyl_color": null,
    "size": 12,
    "barcode_in_record": null,
    "catalog_number": null,
    "owned_copies": 1,
    "active": 1,
    "created_at": null,
    "updated_at": null,
    "deleted_at": null
  }
]

So, there are currently 5 records in the table. In order to test the search functionality, I ran the following query:

SELECT * FROM records WHERE MATCH(`title`, `label`, `description`, `catalog_number`,`barcode`)
AGAINST ('*rev*' IN BOOLEAN MODE)

I was expecting to get 3 records back:

  • 1 for barcode 79027066021 where it should match to the description "description": "rev".
  • 1 for barcode C10777774644618 where it should match to the title "title": "Revolver".
  • 1 for barcode C10777774644619 where it should match to the title "title": "arevolat".

But I'm only getting back the first two records:

  • 1 for barcode 79027066021 where it should match to the description "description": "rev".
  • 1 for barcode C10777774644618 where it should match to the title "title": "Revolver".

This one is not present:

  • 1 for barcode C10777774644619 where it should match to the title "title": "arevolat".

Even though I'm using * as part of the search.

DDL:

CREATE TABLE `records` (
  `barcode` varchar(48) NOT NULL,
  `title` varchar(160) NOT NULL,
  `slug` varchar(250) NOT NULL,
  `image` varchar(191) DEFAULT NULL,
  `artist_id` bigint unsigned DEFAULT NULL,
  `genre_id` bigint unsigned DEFAULT NULL,
  `num_of_records` bigint NOT NULL,
  `original_release_date` date DEFAULT NULL,
  `reissue_release_date` date DEFAULT NULL,
  `label` longtext,
  `sleeve_side_color` varchar(191) DEFAULT NULL,
  `vinyl_color` longtext,
  `size` bigint DEFAULT NULL,
  `barcode_in_record` tinyint(1) DEFAULT NULL,
  `catalog_number` longtext,
  `owned_copies` bigint DEFAULT '1',
  `active` tinyint(1) DEFAULT '1',
  `created_at` datetime(3) DEFAULT NULL,
  `updated_at` datetime(3) DEFAULT NULL,
  `deleted_at` datetime(3) DEFAULT NULL,
  `original_release_year` bigint NOT NULL,
  `description` text,
  PRIMARY KEY (`barcode`),
  UNIQUE KEY `barcode` (`barcode`),
  UNIQUE KEY `slug` (`slug`),
  KEY `fk_records_artist` (`artist_id`),
  KEY `fk_records_genre` (`genre_id`),
  FULLTEXT KEY `all_records_idx` (`title`,`label`,`description`,`catalog_number`,`barcode`),
  CONSTRAINT `fk_records_artist` FOREIGN KEY (`artist_id`) REFERENCES `artists` (`id`),
  CONSTRAINT `fk_records_genre` FOREIGN KEY (`genre_id`) REFERENCES `genres` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

Any idea what's going on?

Thanks

MrCujo
  • 1,218
  • 3
  • 31
  • 56

3 Answers3

2

"%", "*", and other punctuation marks have specific and different meanings for AGAINST, LIKE, and RLIKE.

MATCH(title, ...) AGAINST('*rev*')  -- ignores the leading punctuation mark
title LIKE '%rev%'  -- finds the desired 3 rows, but only in `title`; and slow
title RLIKE 'rev'  -- finds the desired 3 rows, but only in `title`; and slow
title LIKE '%rev%' OR label LIKE '%rev%' OR ... -- even slower

FULLTEXT, when appropriate, is almost always faster than LIKE or RLIKE; often much faster. I suggest you change the requirements and/or expectations. As you see from above, there is no syntax that is both simple and fast.

Rick James
  • 135,179
  • 13
  • 127
  • 222
1

In MySQL, the wildcard character is % rather than *.

Try AGAINST ('%rev%' IN BOOLEAN MODE)

Sometimes, * acts as a modifier to match 'zero or more instances of the thing preceding it.'

See https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html

However, the above Pattern Matching applies to the extended regular expressions (such as when using RLIKE).

For a Boolean Full-text Search, the * is a truncation operator. The * at the end of your expression means that rev must be present at the start of one or more words. This explains the missing record in your results.

I suspect the first * is ignored.

More info: https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html

Zaq
  • 1,071
  • 2
  • 10
  • 19
  • I tried this `AGAINST ('%rev%' IN BOOLEAN MODE)` but didn't work – MrCujo Jun 14 '23 at 14:45
  • The FULLTEXT search works by indexing the start of each word. Imagine looking in a dictionary for words starting with REV, you'll find them all very quickly. However, if you try to find all words containing REV, you'd have to check every page. Here's a previous answer that appears to confirm that you can't use a wildcard prefix in a full-text search. https://stackoverflow.com/a/20680736 – Zaq Jun 15 '23 at 01:15
  • "* acts as a modifier to match 'zero or more instances of the thing preceding it.'" -- That applies to RLIKE, not Fulltext and not LIKE. – Rick James Jun 16 '23 at 00:08
  • Nice pickup, Rick! After four example queries all using LIKE, the docs talk about a different approach (e.g. RLIKE) without so much as a heading. I've learned something today. :) – Zaq Jun 16 '23 at 00:53
1

12.9.5 Full-Text Restrictions

The '%' character is not a supported wildcard character for full-text searches.

https://dev.mysql.com/doc/refman/5.7/en/fulltext-restrictions.html

Asterisk *

The asterisk serves as the truncation (or wildcard) operator. Unlike the other operators, it is appended to the word to be affected. Words match if they begin with the word preceding the * operator. [...]

The wildcarded word is considered as a prefix that must be present at the start of one or more words.[...]

The following examples demonstrate some search strings that use boolean full-text operators: [...]

'apple*'

Find rows that contain words such as “apple”, “apples”, “applesauce”, or “applet”.

https://dev.mysql.com/doc/refman/5.7/en/fulltext-boolean.html

Kostas Nitaf
  • 428
  • 1
  • 2
  • 12