4

I have a Java EE project using PostgreSQL 9.X and JPA2 (Hibernate implementation). How can I force a like query to be case insensitive and accent insensitive?

I'm able to change the charset of the DB because it's the first project using it.

Arjan Tijms
  • 37,782
  • 12
  • 108
  • 140
user1180339
  • 319
  • 1
  • 6
  • 16
  • Are you using Criteria queries or JPQL? Also, see this question: http://stackoverflow.com/questions/4580285/jpa2-case-insensitive-like-matching-anywhere – Craig Ringer Oct 23 '12 at 08:56
  • See also http://stackoverflow.com/questions/4218780/how-to-do-a-like-case-insensitive-and-accent-insensitive-in-oracle-10gr2-and-jpa – Craig Ringer Oct 23 '12 at 09:01
  • I'm using Criteria queries but I can use JPQL if needed. I know the trick with upper but it's always accent sensitive... – user1180339 Oct 23 '12 at 09:02

3 Answers3

5

In general there is no standard way to write "accent-insensitive" code, or to compare words for equality while ignoring accents. The whole idea makes very little sense, as different accented characters mean different things in different languages/dialects, and their "plain ascii" substitutions/expansions vary by language. Please don't do this; resume and résumé are different words, and the situation gets even worse when considering any language(s) other than English.

For case-insensitivity you can use lower(the_col) like lower('%match_expression') in JPQL. As far as I know ilike isn't supported in JPQL, but I have not checked the standard to verify this. It's fairly readable, so consider just downloading the JPA2 spec and reading it. JPA2 Criteria offers Restrictions.ilike for the purpose. Neither will normalize/strip/ignore accented characters.

For stripping accents, etc, you will probably need to use database-engine specific stored functions or native queries. See, eg this prior answer, or if you intended to substitute accented characters with an unaccented alternative this PostgreSQL wiki entry - but again, please don't do this except for very limited purposes like finding places where words may've been "unaccented" by misguided software or users.

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
2

If the unaccent extension is installed:

select unaccent(lower('ãóÊ'));
 unaccent 
----------
 aoe
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
0

I had this issue, and I couldn't use database functions. So instead I used a REGEX restriction in my criteria code:

searchText = unaccent(searchText);
String expression = "firstName ~* '.*" + searchText + ".*'";
Criterion searchCriteria = Restrictions.sqlRestriction(expression);

Then I wrote a function called unaccent to change each character to a or-statement, for example any letter e will become (e|é|è). A query for "hello" will become "h(e|é|è)llo".

Here is the function inspired from this thread Postgres accent insensitive LIKE search in Rails 3.1 on Heroku

private String unaccent(String text) {
    String String charactersProcessed = ""; // To avoid doing a replace multiple times.
    String newText = text.toLowerCase();
    text = newText; // Case statement is expecting lowercase.
    for (int i = 0; i < text.length(); i++) {
        char c = text.charAt(i);
        if (charactersProcessed.contains(c + "")) {
            continue; // We have already processed this character.
        }
        String replacement = "";
        switch (c) {
        case '1': {
            replacement = "¹";
            break;
        }
        case '2': {
            replacement = "²";
            break;
        }
        case '3': {
            replacement = "³";
            break;
        }
        case 'a': {
            replacement = "á|à|â|ã|ä|å|ā|ă|ą|À|Á|Â|Ã|Ä|Å|Ā|Ă|Ą|Æ";
            break;
        }
        case 'c': {
            replacement = "ć|č|ç|©|Ć|Č|Ç";
            break;
        }
        case 'd': {
            replacement = "Đ|Ð";
            break;
        }
        case 'e': {
            replacement = "è|é|ê|ё|ë|ē|ĕ|ė|ę|ě|È|Ê|Ë|Ё|Ē|Ĕ|Ė|Ę|Ě|€";
            break;
        }
        case 'g': {
            replacement = "ğ|Ğ";
            break;
        }
        case 'i': {
            replacement = "ı|ì|í|î|ï|ì|ĩ|ī|ĭ|Ì|Í|Î|Ï|Ї|Ì|Ĩ|Ī|Ĭ";
            break;
        }
        case 'l': {
            replacement = "ł|Ł";
            break;
        }
        case 'n': {
            replacement = "ń|ň|ñ|Ń|Ň|Ñ";
            break;
        }
        case 'o': {
            replacement = "ò|ó|ô|õ|ö|ō|ŏ|ő|ø|Ò|Ó|Ô|Õ|Ö|Ō|Ŏ|Ő|Ø|Œ";
            break;
        }
        case 'r': {
            replacement = "ř|®|Ř";
            break;
        }
        case 's': {
            replacement = "š|ş|ș|ß|Š|Ş|Ș";
            break;
        }
        case 'u': {
            replacement = "ù|ú|û|ü|ũ|ū|ŭ|ů|Ù|Ú|Û|Ü|Ũ|Ū|Ŭ|Ů";
            break;
        }
        case 'y': {
            replacement = "ý|ÿ|Ý|Ÿ";
            break;
        }
        case 'z': {
            replacement = "ž|ż|ź|Ž|Ż|Ź";
            break;
        }
        }
        if (!replacement.isEmpty()) {
            charactersProcessed = charactersProcessed + c;
            newText = newText.replace(c + "", "(" + c + "|" + replacement + ")");
        }
    }

    return newText;
}
Community
  • 1
  • 1
motus
  • 21
  • 3
  • Please, you can use this methode for public static String unaccent(String text) { return Normalizer.normalize(text, Normalizer.Form.NFD).replaceAll("[^\\p{ASCII}]", ""); } – bilelovitch May 05 '16 at 11:00
  • Thank you, unless I missunderstood the aim of this function `Normalizer.normalize(text, Normalizer.Form.NFD).replaceAll("[^\\p{ASCII}]", "")`, it seems that it normalizes a text in the Java code. The need here is to normalize it in PostgreSQL. The unaccent function I wrote, turn this string `"hello"` to this `"h(è|é|ê|ё|ë|ē|ĕ|ė|ę|ě|È|Ê|Ë|Ё|Ē|Ĕ|Ė|Ę|Ě|€)ll(ò|ó|ô|õ|ö|ō|ŏ|ő|ø|Ò|Ó|Ô|Õ|Ö|Ō|Ŏ|Ő|Ø|Œ)"` – motus May 24 '16 at 15:59
  • The solution is not optimal, except for the use of native SQL. – bilelovitch May 25 '16 at 11:50
  • It is possible to use a similar SQL native function, but in my case I couldn't. – motus May 25 '16 at 19:09
  • you can trickery by, the convert of the String alphabet by alphabet (one by one) and make a UNION between all the results. – bilelovitch May 26 '16 at 10:54
  • definetly, dunno why I didn't do that! – motus May 26 '16 at 19:42
  • Since I can't use SQL native for most of the search functionalities. accent-insensitive is ignored. – bilelovitch May 29 '16 at 07:45