5

In SQLite I want to case-insensitive "SELECT LIKE name" works fine for normal latin names, but when the name is in UTF-8 with non-latin characters then the select becomes case-sensitive, how to make it also case-insensitive like latin characters?

p.s. my sqlite is v3 and I connect with PHP PDO

Timo Huovinen
  • 53,325
  • 33
  • 152
  • 143

3 Answers3

6

For SQLite you have 2 options:

  1. compile it with ICU support: How to compile, Compilation options
  2. override the LIKE function, here is a complete solution (from http://blog.amartynov.ru/?p=675)
$pdo = new PDO("sqlite::memory:");

# BEGIN

function lexa_ci_utf8_like($mask, $value) {
    $mask = str_replace(
        array("%", "_"),
        array(".*?", "."),
        preg_quote($mask, "/")
    );
    $mask = "/^$mask$/ui";
    return preg_match($mask, $value);
}

$pdo->sqliteCreateFunction('like', "lexa_ci_utf8_like", 2);

# END

$pdo->exec("create table t1 (x)");
$pdo->exec("insert into t1 (x) values ('[Привет España Dvořák]')");

header("Content-Type: text/plain; charset=utf8");
$q = $pdo->query("select x from t1 where x like '[_РИ%Ñ%ŘÁ_]'");
print $q->fetchColumn();
amartynov
  • 4,125
  • 2
  • 31
  • 35
  • google does not like me asking how to compile with ICU, provide links please. Otherwise you are the first person to provide a proper answer! – Timo Huovinen Apr 08 '11 at 07:39
  • @YuriKolovsky As for compiling ICU, there's no definitive guide and I haven't tried this way, so you still have to google :) – amartynov Apr 08 '11 at 08:21
  • the provided example does not seem to work with Russian letters, `LIKE '%ОЛЬ%'`. What am I doing wrong? if I manage to solve this, you will be my new programming role model :) – Timo Huovinen Apr 08 '11 at 12:41
  • @YuriKolovsky indeed, my function failed with Cyrillic chars. I updated the blog post - please check the new version and let me know if it works. Thank you for testing :) – amartynov Apr 08 '11 at 14:13
2

An improved version of LIKE overloading via a UDF:

$db->sqliteCreateFunction('like',
    function ($pattern, $data, $escape = null) use ($db)
    {
        static $modifiers = null;

        if (isset($modifiers) !== true)
        {
            $modifiers = ((strncmp($db->query('PRAGMA case_sensitive_like;')->fetchColumn(), '1', 1) === 0) ? '' : 'i') . 'suS';
        }

        if (isset($data) === true)
        {
            if (strpbrk($pattern = preg_quote($pattern, '~'), '%_') !== false)
            {
                $regex = array
                (
                    '~%+~S' => '.*',
                    '~_~S' => '.',
                );

                if (strlen($escape = preg_quote($escape, '~')) > 0)
                {
                    $regex = array
                    (
                        '~(?<!' . $escape . ')%+~S' => '.*',
                        '~(?<!' . $escape . ')_~S' => '.',
                        '~(?:' . preg_quote($escape, '~') . ')([%_])~S' => '$1',
                    );
                }

                $pattern = preg_replace(array_keys($regex), $regex, $pattern);
            }

            return (preg_match(sprintf('~^%s$~%s', $pattern, $modifiers), $data) > 0);
        }

        return false;
    }
);

Respects the case_sensitive_like PRAGMA and correctly handles x LIKE y ESCAPE z syntax.

I also wrote another version that does basic and extended romanization of x and y values, so that an accented character will match it's unaccented counterpart, for instance: SELECT 'Á' LIKE 'à%';.

You can star the gist to keep an eye on occasional updates.

Alix Axel
  • 151,645
  • 95
  • 393
  • 500
2

Use a no-case collation, such as : LIKE name COLLATE NOCASE

If you need specific characters that are not part of ASCII to be compared with case folding, the NOCASE will not work, as such folding is not supported by SQLite - you will have to provide your own collation function using your Unicode library of choice and sqlite3_create_collation().

EDIT: also, this might be interesting:

How to sort text in sqlite3 with specified locale?

Community
  • 1
  • 1
Victor Nicollet
  • 24,361
  • 4
  • 58
  • 89
  • According to the documentation for NOCASE it's only ASCII; it specifically says its excludes UTF-8. – borrible Nov 26 '10 at 12:56
  • What the documentation says is that UTF-8 characters **that are not part of ASCII** will be case-sensitive. So A = a but Œ != œ. Edited to clarify. – Victor Nicollet Nov 26 '10 at 13:04