49

In several SO posts OP asked for an efficient way to search text columns in a case insensitive manner.

As much as I could understand the most efficient way is to have a database with a case insensitive collation. In my case I am creating the database from scratch, so I have the perfect control on the DB collation. The only problem is that I have no idea how to define it and could not find any example of it.

Please, show me how to create a database with case insensitive collation.

I am using postgresql 9.2.4.

EDIT 1

The CITEXT extension is a good solution. However, it has some limitations, as explained in the documentation. I will certainly use it, if no better way exists.

I would like to emphasize, that I wish ALL the string operations to be case insensitive. Using CITEXT for every TEXT field is one way. However, using a case insensitive collation would be the best, if at all possible.

Now https://stackoverflow.com/users/562459/mike-sherrill-catcall says that PostgreSQL uses whatever collations the underlying system exposes. I do not mind making the OS expose a case insensitive collation. The only problem I have no idea how to do it.

Community
  • 1
  • 1
mark
  • 59,016
  • 79
  • 296
  • 580

5 Answers5

35

A lot has changed since this question. Native support for case-insensitive collation has been added in PostgreSQL v12. This basically deprecates the citext extension, as mentioned in the other answers.

In PostgreSQL v12, one can do:

    CREATE COLLATION case_insensitive (
      provider = icu,
      locale = 'und-u-ks-level2',
      deterministic = false
    );

    CREATE TABLE names(
      first_name text,
      last_name text
    );

    insert into names values
      ('Anton','Egger'),
      ('Berta','egger'),
      ('Conrad','Egger');

    select * from names
      order by
        last_name collate case_insensitive,
        first_name collate case_insensitive;

See https://www.postgresql.org/docs/current/collation.html for more information.

  • 2
    Note that this depends on the operating system and the ICU version that comes with it. –  Nov 29 '19 at 08:56
  • 10
    Note that as of PostgreSQL v12, non deterministic collations DO NOT support LIKE and LIKE – Stefan Anghel May 19 '20 at 15:20
  • check this if it's not working because of ICU or OS version: https://dba.stackexchange.com/a/255783 – D3Hunter Oct 09 '20 at 09:46
  • 12
    What does "und-u-ks-level2" mean? – user275801 Feb 25 '21 at 09:15
  • Also how is it possible to set a database to a user-defined collation? do i have to create a row in pg_catalog? – user275801 Feb 25 '21 at 09:30
  • This solution is referenced here where a problem was reported: https://dba.stackexchange.com/questions/255780/case-insensitive-collation-still-comparing-case-sensitive – Larry Freeman Jul 07 '22 at 07:04
  • If I specify such custom collation in a where or order-by clause, can the indexes still be used? – Jānis Elmeris May 30 '23 at 12:39
  • @user275801 See the Unicode [Collation Settings](https://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options) table for more info on this syntax. – AndrewL Jun 07 '23 at 17:34
10

There are no case insensitive collations, but there is the citext extension:

http://www.postgresql.org/docs/current/static/citext.html

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
  • 4
    See my other answer (https://stackoverflow.com/a/59101567/8870331) for PostgreSQL v12 and beyond. –  Nov 29 '19 at 08:54
8

For my purpose the ILIKE keyword did the job.

From the postgres docs:

The key word ILIKE can be used instead of LIKE to make the match case-insensitive according to the active locale. This is not in the SQL standard but is a PostgreSQL extension.

RobDil
  • 4,036
  • 43
  • 52
  • 3
    Unless you escape the pattern, this will produce wrong results for strings containing `_` or `%` if you attempt to use it like `=`. – Craig Ringer Aug 22 '14 at 01:43
  • 2
    Problem is, frameworks like TypeORM has no support for ILIKE in their find conditions. – GeorgiG Mar 07 '19 at 10:58
2

This is not changing collation, but maybe somebody help this type of query, where I was use function lower:

SELECT id, full_name, email FROM nurses WHERE(lower(full_name) LIKE '%bar%' OR lower(email) LIKE '%bar%')

Štefan Bartoš
  • 561
  • 6
  • 8
-3

I believe you need to specify your collation as a command line option to initdb when you create the database cluster. Something like

initdb --lc-collate=en_US.UTF-8 

It also seems that using PostgreSQL 9.3 on Ubuntu and Mac OS X, initdb automatically creates the database cluster using a case-insensitive collation that is default in the current OS locale, in my case, en_US.UTF-8.

Could you be using an older version of PostgreSQL that does not default to the host locale? Or could it be that you are on an operating system that does not provide any case-insensitive collations for PostgreSQL to choose from?

glyphobet
  • 1,564
  • 11
  • 17
  • I am using now PostgreSQL 9.3 on Windows 7 and 8. I have no idea whether they provide a case insensitive collation for PostgreSQL. I know that SQL Server can be configured with such a collation. – mark Nov 30 '13 at 22:21
  • I can't help with Windows... but it sounds like that's the place to start. Find out what case-insensitive collations Windows provides, and see if you can tell PostgreSQL to use one of them when it creates the cluster. – glyphobet Dec 02 '13 at 17:00
  • From my experiments with PostgreSQL 9.6, the --lc-collate=en_US.UTF-8 option does not produce a case-insensitive collation. – Berend de Boer May 09 '17 at 03:24
  • This answer is wrong. The collation is not case insensitive. – Laurenz Albe Jan 17 '22 at 13:07