3

I want to search in my database for polish cities which commonly consist of a lot of special characters. I want to migrate those to ASCII, so that even those without a polish keyboard can do a regular search.

In PHP it looks like

$ascii = iconv("UTF-8", "ASCII//TRANSLIT//IGNORE", $text);

but how to implement that in mysql select query?

select * from city where CONVERT(city using ASCII)=CONVERT(? using ASCII)

does not work, because it replaces all unknown characters with a question mark

select CONVERT('Lódź' using ascii );
-- RESULT: L?d?
Édouard Lopez
  • 40,270
  • 28
  • 126
  • 178
MatthiasLaug
  • 2,924
  • 6
  • 28
  • 43

3 Answers3

1

As explained in this answer, if you use the collation utf8_general_ci, string comparations will be case and accent insensitive, so you won't need to specify any conversions when doing the query:

CREATE TABLE test (name varchar(100)) COLLATE 'utf8_general_ci';
INSERT test (name) VALUES ('Lódź');

SELECT * FROM test WHERE name = 'Lodz';

name
----
Lódź

Fiddle here

And if you want to specify another collation for your column, you can even apply this collation in a specific query:

CREATE TABLE test (name varchar(100)) COLLATE 'utf8_polish_ci';
INSERT test (name) VALUES ('Lódź');


SELECT * FROM test WHERE name = 'Lodz';

Empty set (0.00 sec)


SELECT * FROM test WHERE name COLLATE 'utf8_general_ci' = 'Lodz';

name
----
Lódź

Second fiddle here

Community
  • 1
  • 1
rsanchez
  • 14,467
  • 1
  • 35
  • 46
1

Operator Behaviors

Equal Operator =

If you are doing an exact searches using the equal operator = the solution from @rsanchez is correct. For instance, finding row equal to œ can be done using this exact value œ or its transliterated value oe:

SELECT 'œ' = 'oe' COLLATE utf8_unicode_ci ;
-- true
+-------------------------------------+
| 'œ' = 'oe' COLLATE utf8_unicode_ci  |
+-------------------------------------+
|                                   1 |
+-------------------------------------+

Will return rows being equal to œ or oe if user use any of them.

LIKE Operator

However if you want to do fuzzy search using the LIKE operator, this won't work. For instance, searching for words containing œ by typing oe will not return row with œ.

SELECT 'œ' LIKE 'oe' COLLATE utf8_unicode_ci ;
-- false
+----------------------------------------+
| 'œ' LIKE 'oe' COLLATE utf8_unicode_ci  |
+----------------------------------------+
|                                      0 |
+----------------------------------------+

Explanation ?

This is not a bug but the expected behavior by SQL standard (see Alexander Barkov answer) as LIKE is doing comparison on a character basis while = use a more complex approach (e.g. allowing transliteration equality).

Solutions

Provide the transliterated form

A solution would be to generate the transliteration of your search and modify your query to search for each form:

SELECT * FROM `mytable` 
  WHERE `myfield` LIKE 'oe' or `myfield` LIKE 'oe' 
COLLATE  utf8_unicode_ci;

Use FullText search

Since MySQL 5.6, fulltext search is now available for InnoDB (previously it was only available for MyISAM tables).

Édouard Lopez
  • 40,270
  • 28
  • 126
  • 178
0

its late but my answer is:

CREATE FUNCTION TO_ASCII(
    text_to_replace MEDIUMTEXT
) RETURNS MEDIUMTEXT
BEGIN
    DECLARE i INT;
    DECLARE char_from, char_to VARCHAR(50);

    SET char_from := 'ĄĆĘŁŃÓŚŻŹąćęłńóśżź ';
    SET char_to   := 'ASCZZEOLNasczzeoln_';
    SET i := CHAR_LENGTH(char_from);

    WHILE i
        DO
            SET text_to_replace := REPLACE(text_to_replace, SUBSTR(char_from, i, 1), SUBSTR(char_to, i, 1));
            SET i:= i - 1;
        END WHILE;

    RETURN text_to_replace;
END;

and usage:

SELECT TO_ASCII('abcąśćó   ĄĆĘŁŃÓŚŻŹąćęłńóśżź  ĄĆĘŁŃÓŚŻŹąćęłńóśżź ');
Marek Lisiecki
  • 498
  • 6
  • 10