2

I have a small data set stored on MySQL and I would like to avoid using/managing ElasticSearch/Sphinx etc. instances, keeping up the index updated etc. and use MySQL full text indexes for searching them.

But my particular requirement is doing a fuzzy search which can find "Türkçe" when keyword is "Turkce" or vice versa.

I'm not familiar with the terminology so couldn't decide between features. How can I do that with MySQL? Should I use Soundex or any other function?

Thank you very much.

she hates me
  • 1,212
  • 5
  • 25
  • 44

2 Answers2

1

You have a couple of choices.

The easiest, if it works correctly for you, is to set the MySql collation for the columns involved in your FULLTEXT index to utf8mb4_unicode_ci (or utf8_unicode_ci). If your columns already use the unicode encoding, you won't lose data if you do that. The case-insensitive _ci encoding treats many characters with diareses as their unadorned counterparts. But this collation works best for Western European languages like Swedish (the country of MySql's developers). Give it a try. Let us know how well it works for Turkish, please.

If it doesn't work, try storing two columns for each column you want to search. One should contain the actual value, and the other a search slug value. You compute the search slug value from the actual value with a stored function of your own making. Then, you create your FULLTEXT indexes on the search slug columns.

Whenever you search, you convert your user-supplied search term to its search slug equivalent and search for that.

The search slug stored function converts your diaresis-bearing characters like ç to their unadorned equivalents like c. Here are some ideas about creating that stored function. How to remove accents in MySQL?

SOUNDEX is entirely worthless for your purpose. It was invented many decades ago, when computers had thousands of memory locations, for the specific purpose of looking up North American proper names.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
0
create table lang(t varchar(200), FULLTEXT INDEX(t));

insert into lang values ('Türkçe');
insert into lang values ('Turkce');

mysql> select * from lang;
+--------+
| t      |
+--------+
| Türkçe |
| Turkce |
+--------+
2 rows in set (0.00 sec)

mysql> select * from lang where t like '%Turkce%';
+--------+
| t      |
+--------+
| Türkçe |
| Turkce |
+--------+
2 rows in set (0.00 sec)
Luuk
  • 12,245
  • 5
  • 22
  • 33