5

Rails collection ordering doesn't works as expected with UTF-8 string:

> University.order('abbr asc').map(&:abbr)
=> ["Б", "В", "А"]

It should be

> University.order('abbr asc').map(&:abbr)
=> ["А", "Б", "В"]

What am I missing?

Rails 4.1.8 with ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-darwin14.0]

Stepan Kuzmin
  • 1,031
  • 2
  • 11
  • 21
  • Per-query collation specifying could also be possible, see https://stackoverflow.com/a/32653074/3319298 – Epigene Nov 16 '21 at 14:19

3 Answers3

5

This is most likely a collation settings issue with PostgreSQL:

The collation feature allows specifying the sort order and character classification behavior of data per-column, or even per-operation. This alleviates the restriction that the LC_COLLATE and LC_CTYPE settings of a database cannot be changed after its creation.

You can try and fix the collation of a column with a Rails migration. Something like this:

class FixCollationForAbbr < ActiveRecord::Migration
  def up
    execute 'ALTER TABLE universities ALTER COLUMN abbr TYPE varchar COLLATE "ru_RU";' 
  end
end

You should probably also add collation information to your database.yml:

defaults: &defaults
  adapter: postgresql
  encoding: utf8
  collation: ru_RU.utf8
  ctype: ru_RU.utf8

Here is how the database.yml settings affect table creation with PostgreSQL:

def create_database(name, options = {})
  options = { encoding: 'utf8' }.merge!(options.symbolize_keys)

  option_string = options.inject("") do |memo, (key, value)|
    memo += case key
    ...snip...
    when :encoding
      " ENCODING = '#{value}'"
    when :collation
      " LC_COLLATE = '#{value}'"
    when :ctype
      " LC_CTYPE = '#{value}'"
    ...snip...
    end
  end

  execute "CREATE DATABASE #{quote_table_name(name)}#{option_string}"
end
matt
  • 78,533
  • 8
  • 163
  • 197
Casper
  • 33,403
  • 4
  • 84
  • 79
  • 1
    Your answer is better. @Stepan: run `locale -a` on your command line in order to see all locales your system supports, in case the one you need is not `ru_RU.utf8`. – janfoeh May 16 '15 at 10:21
  • Thanks for reply! But unfortunately setting collation `ru_RU.UTF-8` did not help – Stepan Kuzmin May 16 '15 at 11:09
  • Mea culpa, setting collation did help, but only for new created records. How can I update collation for my records? – Stepan Kuzmin May 16 '15 at 11:24
  • @StepanKuzmin Are you sure you completely rebooted your app after you changed the collation for your column with the migration? It should affect all data. However you should be able to specify the collation in the query also like this: `University.order('abbr asc COLLATE "ru_RU"')`. I don't have pgsql right now so I can't test it, but it should work like that. – Casper May 16 '15 at 11:47
  • You could also try to reindex the whole DB to make sure it's not an indexing problem: `REINDEX DATABASE my_rails_db;`. – Casper May 16 '15 at 11:51
  • Just so you know, guys, I've stumbled upon PostgreSQL sorting just being plain broken on OSX (https://github.com/PostgresApp/PostgresApp/issues/216). It works fine on Linux though. Had some hard times debugging this in my specs. – SkyWriter May 16 '15 at 15:52
2

If you use the same locale when sorting, try out the ffi-icu gem. This part of the docs is most relevant to you.

There is also sort_alphabetical gem, but it relies on NFD decomposition so it may not work with all characters.

0

You can use "ru-RU-x-icu" collation in PostgreSQL, and either add it explicitly to every column, or monkey-patch ActiveRecord to add default collation if it's not defined:

module ActiveRecord
  module ConnectionAdapters
    module PostgreSQL
      module ColumnMethods
        def new_column_definition(name, type, **options) # :nodoc:
          if integer_like_primary_key?(type, options)
            type = integer_like_primary_key_type(type, options)
          end
          type = aliased_types(type.to_s, type)
          options[:primary_key] ||= type == :primary_key
          options[:null] = false if options[:primary_key]
          # set default collation if it's not defined explicitly
          options[:collation] = "ru-RU-x-icu" if options[:collation].blank? && (type == :string || type == :text)
          create_column_definition(name, type, options)
        end
      end
    end
  end
end
leshik
  • 33
  • 5