0

I am having a problem with regex searches in MySQL on certain diacritic characters. A search for a specific character returns other diacritic characters.

Table with schema:

CREATE TABLE names (first_name varchar(20)) DEFAULT 
CHARSET=utf8;

and rows:

INSERT INTO names (first_name) VALUES ('Cătălïn');
INSERT INTO names (first_name) VALUES ('Kēlee');

running this query:

SELECT * FROM names WHERE LOWER(first_name) REGEXP '[ē]+';

should return:

first_name
----------
Kēlee

Instead, it returns:

first_name
----------
Cătălïn
Kēlee

Am I missing something here? Running on 5.7

http://sqlfiddle.com/#!9/059827/1/0 (this maxes out at 5.6)

simonds
  • 1
  • 2
  • 1
    `SELECT * FROM names WHERE LOWER(first_name) like '%ē%';` – RomanPerekhrest May 25 '17 at 22:04
  • Strangely, if I change query to `SELECT * FROM names WHERE LOWER(first_name) REGEXP 'ē+';` it works. – simonds May 25 '17 at 22:08
  • There are a lot of strangenesses when you try to use REGEXP on utf8 encoding. The regexp is strange too: You are asking if there is one or more `ē` in a row and checking for a single character in `[]`. Neither of those adds anything to the query (if it worked). For simply checking for `ē`: `WHERE HEX(first_name) REGEXP "C493" – Rick James May 26 '17 at 23:39

0 Answers0