I'm trying to build a MySQL search query. LIKE
would be perfectly fine, but the client wants inputs with "oe" find "ö", "ae" find "ä" and "ue" find "ü" as it's fairly common in Germany.
I tried using REGEXP
after replacing every occurence of "oe" to (oe|ö)
but REGEXP
is strict and doesn't match (for instance) "é" to "e".
Is there a way to make LIKE
match "oe|ue|ae" or maybe some other way I haven't though of?
Thanks,
thomas

- 2,297
- 2
- 22
- 23
-
2Wow that was quick. Downvoted before I was finished ... :( – thomas Jul 28 '16 at 12:22
-
Probably consider investing in a search engine like Solr. – apokryfos Jul 28 '16 at 12:27
-
Tip: set utf8_general in mysql then `e` will find `è` http://stackoverflow.com/questions/2344118/utf-8-general-bin-unicode – JustOnUnderMillions Jul 28 '16 at 12:28
-
Not with Regexp @JustOnUnderMillions – thomas Jul 28 '16 at 12:32
-
I wouldn't use regexp. Mysql's regexp doesn't work well with multibyte characters. `The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multibyte safe and may produce unexpected results with multibyte character sets.` http://dev.mysql.com/doc/refman/5.7/en/regexp.html – chris85 Jul 28 '16 at 12:35
2 Answers
At Character Sets and Collations Supported by MySQL I can only spot two German collations:
- latin1_german1_ci
- latin1_german2_ci
It seems that latin1_german2_ci is the one you want, however it expects Latin1:
latin1_german2_ci
(phone-book) rules:
- Ä = AE
- Ö = OE
- Ü = UE
- ß = ss
If your table/column is not already using it, you can force such collation in the query itself, e.g.:
mysql> SELECT _latin1'oe' collate latin1_german2_ci ='ö' AS are_equal;
+-----------+
| are_equal |
+-----------+
| 1 |
+-----------+
1 row in set (0.00 sec)
If your application is using Latin1 this should do the trick. Otherwise, I honestly have no idea :)
Disclaimer: I know nothing about German. There may be another language that uses similar rules.

- 142,137
- 41
- 261
- 360
-
1Thanks! This is really helpful. Unfortunately our application uses UTF8 ... – thomas Jul 28 '16 at 12:58
-
@thomas If you have the chance of testing this on UTF-8 please share your experience. As I said, I don't really know MySQL handles mixed encodings. – Álvaro González Jul 29 '16 at 06:09
-
1I get an "illegal mix of collations" error. As you said, latin_german2_ci expects latin1 characters. – thomas Jul 29 '16 at 08:15
-
But this is really helpful for future projects. For now went for the ugly way of converting my search string and results via `REPLACE()`, something suggested here but later deleted (thanks to the unknown) – thomas Jul 29 '16 at 08:17
-
There's also [CONVERT()](https://dev.mysql.com/doc/refman/5.7/en/charset-convert.html) but I don't have experience with it. – Álvaro González Jul 29 '16 at 08:24
-
1Thanks to your `CONVERT()` comment I *almost* had it working: `SELECT convert(_utf8'ausdruecklich' using latin1) COLLATE latin1_german2_ci = convert('ausdrücklich' using latin1) as equal` but then came this: http://stackoverflow.com/questions/7917367/mysql-why-are-collation-rules-ignored-by-like-operator-for-german-%C3%9F-character I guess I'm outta luck. Thanks again for your help! – thomas Jul 29 '16 at 09:00
If you are using utf8, COLLATE utf8_german2_ci
is required. See collation chart (And, according to that chart, german2
is the only one to satisfy your needs.)
mysql> SELECT "oe" = "ö" COLLATE utf8_german2_ci;
+-------------------------------------+
| "oe" = "ö" COLLATE utf8_german2_ci |
+-------------------------------------+
| 1 |
+-------------------------------------+
However, It is much more efficient to declare the column(s) to be COLLATE utf8_german2_ci
than to use that clause in comparisons.
(If you are using utf8mb4, change the spelling accordingly.)

- 135,179
- 13
- 127
- 222