10

It had been written many times already that Opencart's basic search isn't good enough .. Well, I have came across this issue:

When customer searches product in my country (Slovakia (UTF8)) he probably won't use diacritics. So he/she writes down "cucoriedka" and found nothing.

But, there is product named "čučoriedka" in database and I want it to display too, since that's what he was looking for.

Do you have an idea how to get this work? The simple the better!

  • add another column and then add alternative keywords for the product name. this is just a suggestion! –  Feb 06 '13 at 22:32
  • I've been thinking about it, but there are many possibilities for the word. f.e. if he's finding čučoriedka, he might write čucoriedka, cučoriedka, cucoriedka ... I can't write all possible words into database nor with some script – Jožko Golonka Feb 07 '13 at 07:27
  • akam said the right thing.. you can add one more column for this and in script while searching you have to match both field values. so that your search value ll match either čučoriedka or else other value – jeeva Mar 14 '13 at 13:28
  • Maybe You would like to take a look at this: http://stackoverflow.com/questions/4813620/how-to-remove-accents-in-mysql - if Your collation is aproppriately set, the chcaracters `š`, `č`, `á`, etc. should be the recognized the same as `s`, `c`, `a`, etc. thus the search keywords `cucoriedka`, `čucoriedka` and `cučoriedka`, etc. should all match the original word `čučoriedka`. try and think about it. – shadyyx Mar 20 '13 at 12:13
  • OT, but I guess the real Jozef Golonka wouldn't be pleased You are using his name... – shadyyx Mar 20 '13 at 12:15

2 Answers2

5

I'm ignorant of Slovak, I am sorry. But the Slovak collation utf8_slovak_ci treats the Slovak letter č as distinct from c. (Do the surnames starting with Č all come after those starting with C in your telephone directories? They probably do. The creators of MySQL certainly think they do.)

The collation utf8_general_ci treats č and c the same. Here's a sql fiddle demonstrating all this. http://sqlfiddle.com/#!9/46c0e/1/0

If you change the collation of the column containing your product name to utf8_general_ci, you will get a more search-friendly table. Suppose your table is called product and the column with the name in it is called product_name. Then this SQL data-definition statement will convert the column as you require. You should look up the actual datatype of the column instead of using varchar(nnn) as I have done in this example.

 alter table product modify product_name varchar(nnn) collate utf8_general_ci

If you can't alter the table, then you can change your WHERE clause to work like this, specifying the collation explicitly.

WHERE 'userInput' COLLATE utf8_general_ci = product_name

But this will be slower to search than changing the column collation.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
0

You can use SOUNDEX() or SOUNDS LIKE function of MySQL.

These functions compare phonetics.

Accuracy of soundex is doubtful for other than English. But, it can be improved if we use it like

select soundex('ball')=soundex('boll') from dual

SOUNDS LIKE can also be used.

Using combination of both SOUNDEX() and SOUNDS LIKE will improve accuracy.

Kindly refer MySQL documentation for details OR mysql-sounds-like-and-soundex

seahawk
  • 1,872
  • 12
  • 18
  • 1
    `SOUNDEX` is an ancient algorithm (patented in 1918) tuned especially for locating surnames in English. It may or may not work well for Slovak product names. – O. Jones Aug 23 '15 at 22:13
  • Yes, you are absolutely right. Accuracy of soundex is doubtful for other laguages. but, it can be improved if we use it like `select soundex('ball')=soundex('boll') from dual` – seahawk Aug 23 '15 at 22:19
  • It is a good idea to change column collation? `ALTER TABLE product_description CHANGE name name VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;` I read somewhere that changing collation might result in not using indexes anymore. Is this true? – Adrian Aug 23 '15 at 22:34
  • 1
    Changing `WHERE x='...' COLLATE ...` will not be able to use `INDEX(x)`. But it might give you the desired result. If possible, `ALTER` the table to change the collation of `x`. For the OP's case, either `utf8_general_ci` or `utf8_unicode_ci` will work. Possibly others will work, too; see http://mysql.rjweb.org/utf8_collations.html . – Rick James Aug 24 '15 at 00:29