1

I'm on ruby 2.2.0 and rails 4.2.0.

For a project i have a table calles 'Character' where each record is a character. When i'm doing a search for a record with 'where' for example the framework do a mistake between character.

For example :

Basic.where(:character => 'Í')

return all record with a I like character: "Ï", character: "I", character: "i", character: "í", character: "ì",...

My DB is encoding in utf8-general-ci and when i put my data into the db I use 'iso-8859-1:utf-8' encoding.

Bob Gilmore
  • 12,608
  • 13
  • 46
  • 53
Gregory Frerot
  • 1,141
  • 1
  • 8
  • 12

1 Answers1

0

utf8_general_ci has issue were it strip characters with combining characters. In short use utf8_unicode_ci which uses the Unicode Collation Algorithm and instead. This has already been answered very well in What are the diffrences between utf8_general_ci and utf8_unicode_ci?

EDIT: It actually seems like not even utf8_unicode_ci handles this correctly.

Here's the code I used to test this

SET collation_connection = 'utf8_bin';
SELECT 'Ï' = 'I'; -- 0

SET collation_connection = 'utf8_unicode_ci';
SELECT 'Ï' = 'I'; -- 1

SET collation_connection = 'utf8_general_ci';
SELECT 'Ï' = 'I'; -- 1

SET collation_connection = 'utf8mb4_bin';
SELECT 'Ï' = 'I'; -- 0

SET collation_connection = 'utf8mb4_unicode_ci';
SELECT 'Ï' = 'I'; -- 1

SET collation_connection = 'utf8mb4_general_ci';
SELECT 'Ï' = 'I'; -- 1

EDIT2:

It looks like Postgres handles this better, http://sqlfiddle.com/#!15/9eecb/797. If you can control the choice of DB I would suggest using Postgres instead

Hugo Tunius
  • 2,869
  • 24
  • 32