18

How can I modify a where/like condition on a search query in Rails:

find(:all, :conditions => ["lower(name) LIKE ?", "%#{search.downcase}%"])

so that the results are matched irrespective of accents? (eg métro = metro). Because I'm using utf8, I can't use "to_ascii". Production is running on Heroku.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
user1051849
  • 2,307
  • 5
  • 26
  • 43

5 Answers5

35

Proper solution

Since PostgreSQL 9.1 you can just:

CREATE EXTENSION unaccent;

Provides a function unaccent(), doing what you need (except for lower(), just use that additionally if needed). Read the manual about this extension.

More about unaccent and indexes:

Poor man's solution

If you can't install unacccent, but are able to create a function. I compiled the list starting here and added to it over time. It is comprehensive, but hardly complete:

CREATE OR REPLACE FUNCTION lower_unaccent(text)
  RETURNS text
  LANGUAGE sql IMMUTABLE STRICT AS
$func$
SELECT lower(translate($1
     , '¹²³áàâãäåāăąÀÁÂÃÄÅĀĂĄÆćčç©ĆČÇĐÐèéêёëēĕėęěÈÊËЁĒĔĖĘĚ€ğĞıìíîïìĩīĭÌÍÎÏЇÌĨĪĬłŁńňñŃŇÑòóôõöōŏőøÒÓÔÕÖŌŎŐØŒř®ŘšşșߊŞȘùúûüũūŭůÙÚÛÜŨŪŬŮýÿÝŸžżźŽŻŹ'
     , '123aaaaaaaaaaaaaaaaaaacccccccddeeeeeeeeeeeeeeeeeeeeggiiiiiiiiiiiiiiiiiillnnnnnnooooooooooooooooooorrrsssssssuuuuuuuuuuuuuuuuyyyyzzzzzz'
     ));
$func$;

Your query should work like that:

find(:all, :conditions => ["lower_unaccent(name) LIKE ?", "%#{search.downcase}%"])

For left-anchored searches, you can use an index on the function for very fast results:

CREATE INDEX tbl_name_lower_unaccent_idx
  ON fest (lower_unaccent(name) text_pattern_ops);

For queries like:

SELECT * FROM tbl WHERE (lower_unaccent(name)) LIKE 'bob%';

Or use COLLATE "C". See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • hi Erwin, thanks for this. I'm on 9.1 so CREATE EXTENSION unaccent; seems like the way forward. How would you suggest i activate it through my rails app though (as i need this to happen on heroku as well as my dev environment)... thanks! – user1051849 Feb 14 '12 at 10:20
  • If you're stuck on 9.0, you can still install unaccent if you execute C:\Program Files\PostgreSQL\9.0\share\contrib\unaccent.sql – Edo Oct 21 '14 at 13:59
  • 2
    (3 years later:) Heroku includes `unaccent`: https://devcenter.heroku.com/articles/heroku-postgres-extensions-postgis-full-text-search You can verify by running `echo 'show extwlist.extensions' | heroku pg:psql` – Henrik N Jan 23 '15 at 20:06
  • 2
    it worked beautifully. I inserted it on my rails application with a migration. Here's an awesome example of how you could do that: http://stackoverflow.com/questions/16611226/how-to-install-postgres-extensions-at-database-creation – Hamdan Sep 28 '16 at 05:43
23

For those like me who are having trouble on add the unaccent extension for PostgreSQL and get it working with the Rails application, here is the migration you need to create:

class AddUnaccentExtension < ActiveRecord::Migration
  def up
    execute "create extension unaccent"
  end

  def down
    execute "drop extension unaccent"
  end
end

And, of course, after rake db:migrate you will be able to use the unaccent function in your queries: unaccent(column) similar to ... or unaccent(lower(column)) ...

Edison Machado
  • 1,410
  • 19
  • 29
  • 3
    Verifying if the extension is not already present by doing so will prevent a migration crash: ``` def up execute "create extension if not exist unaccent" end ``` – Loris Jun 04 '20 at 12:52
  • 1
    Slight typo - the correct code is: execute "create extension if not exists unaccent;" - exists is plural. See https://www.postgresql.org/docs/9.1/sql-createextension.html – Dom Eden Sep 27 '21 at 12:10
3

First of all, you install postgresql-contrib. Then you connect to your DB and execute:

CREATE EXTENSION unaccent;

to enable the extension for your DB.

Depending on your language, you might need to create a new rule file (in my case greek.rules, located in /usr/share/postgresql/9.1/tsearch_data), or just append to the existing unaccent.rules (quite straightforward).

In case you create your own .rules file, you need to make it default:

ALTER TEXT SEARCH DICTIONARY unaccent (RULES='greek');

This change is persistent, so you need not redo it.

The next step would be to add a method to a model to make use of this function.

One simple solution would be defining a function in the model. For instance:

class Model < ActiveRecord::Base
    [...]
    def self.unaccent(column,value)
        a=self.where('unaccent(?) LIKE ?', column, "%value%")
        a
    end
    [...]
end

Then, I can simply invoke:

Model.unaccent("name","text")

Invoking the same command without the model definition would be as plain as:

Model.where('unaccent(name) LIKE ?', "%text%"

Note: The above example has been tested and works for postgres9.1, Rails 4.0, Ruby 2.0.

UPDATE INFO
Fixed potential SQLi backdoor thanks to @Henrik N's feedback

Ruby Racer
  • 5,690
  • 1
  • 26
  • 43
  • Danger! If you just string-interpolate the value into the SQL like that, and the value is user-provided, you're opening yourself up to SQL injections. This is safer since Rails will escape things for you: `Model.where("unaccent(name) LIKE unaccent(?)", "%#{value}%")` or just `Model.where("unaccent(name) LIKE ?", "%#{value}%")` if you don't care about unaccenting the value. – Henrik N Jan 23 '15 at 20:29
  • You are right, of course... I would not do this error now, but this is old.. I'll fix it, thanks for noting – Ruby Racer Jan 23 '15 at 21:51
  • No problem. Hm, I suspect using `unaccent(?)` for the column name will treat it as a string rather than a column name, but I'm not sure. – Henrik N Jan 25 '15 at 15:33
  • Not tested, I ended up using lucene solr, but it is OK to use strings as column names. – Ruby Racer Jan 25 '15 at 17:30
  • Anyone tried that on Cloud9? Can't find the usr/share folder... Looking around cloud9 forums as well. Nothing really useful coming up yet. – Tashows Oct 22 '17 at 19:46
  • Figured a workaround for this. 1) Run cd `pg_config --sharedir`/tsearch_data to get to the right folder. 2) Found an updated unaccent.rules file here: https://raw.githubusercontent.com/postgres/postgres/master/contrib/unaccent/unaccent.rules 3) Edited and uploaded somewhere I have access to through a link. 4) Run sudo curl -O http://your-custom-link-here – Tashows Oct 22 '17 at 21:56
2

There are 2 questions related to your search on the StackExchange: https://serverfault.com/questions/266373/postgresql-accent-diacritic-insensitive-search

But as you are on Heroku, I doubt this is a good match (unless you have a dedicated database plan).

There is also this one on SO: Removing accents/diacritics from string while preserving other special chars.

But this assumes that your data is stored without any accent.

I hope it will point you in the right direction.

Community
  • 1
  • 1
Pierre
  • 8,368
  • 4
  • 34
  • 50
0

Assuming Foo is the model you are searching against and name is the column. Combining Postgres translate and ActiveSupport's transliterate. You can do something like:

Foo.where(
  "translate(
    LOWER(name),
    'âãäåāăąÁÂÃÄÅĀĂĄèééêëēĕėęěĒĔĖĘĚìíîïìĩīĭÌÍÎÏÌĨĪĬóôõöōŏőÒÓÔÕÖŌŎŐùúûüũūŭůÙÚÛÜŨŪŬŮ',
    'aaaaaaaaaaaaaaaeeeeeeeeeeeeeeeiiiiiiiiiiiiiiiiooooooooooooooouuuuuuuuuuuuuuuu'
  )
  LIKE ?", "%#{ActiveSupport::Inflector.transliterate("%qué%").downcase}%"
)
Christian Fazzini
  • 19,613
  • 21
  • 110
  • 215