Difference between LIKE and ~ in Postgres

Question

I've been instructed "not to bother with LIKE" and use ~ instead. What is wrong with LIKE and how is ~ different?

Does ~ have a name in this context or do people say "use the tilde operator"?

Gotcha, thanks. I just meant "where" more than "whom", sorry. — Craig Ringer, Sep 17 '12 at 07:52
Foolish advice. ~ is a regex operator, LIKE isn't . ~ is more powerful but LIKE is (significantly) more performant. — Shayne, Mar 09 '23 at 02:05

asthasr · Accepted Answer · 2018-06-20T20:43:46.513

87

~ is the regular expression operator, and has the capabilities implied by that. You can specify a full range of regular expression wildcards and quantifiers; see the documentation for details. It is certainly more powerful than LIKE, and should be used when that power is needed, but they serve different purposes.

edited Jun 20 '18 at 20:43

answered Sep 17 '12 at 02:16

asthasr

9,125
1
29
43

47

This is very bad advice. Whenever you can use the SQL standard `LIKE` it is generally preferable to a regular expression. It is simpler, much faster and easier to use with appropriate indexes. Regular expressions are more powerful, but slower and non-standard. – Erwin Brandstetter Sep 17 '12 at 11:56
23

That's why I said "when appropriate, but they serve different purposes." – asthasr Sep 17 '12 at 12:02
19

`should be used in preference to the LIKE operator when appropriate` is just wrong. When you have a choice, use LIKE. When you don't have a choice, you don't have a choice anyway. – Erwin Brandstetter Sep 17 '12 at 13:07
5

Have you not seen people trying to shoehorn pattern matching into a query with multiple LIKE clauses? That's necessary in some other systems, but Postgres offers the regex operator. Perhaps I should say: "should be used in preference to multiple LIKE clauses." – asthasr Sep 17 '12 at 13:21
2

Edited to reflect this discussion. – asthasr Sep 17 '12 at 13:29
7

Removed the downvote accordingly as the answer doesn't strike me as wrong any more. – Erwin Brandstetter Sep 17 '12 at 13:45
1

It should be noted that LIKE is should be preferred because "LIKE searches, being much simpler than the other two options, are safer to use with possibly-hostile pattern sources." - https://www.postgresql.org/docs/10/static/functions-matching.html – Abass Sesay Jun 25 '18 at 16:55

Erwin Brandstetter · Answer 2 · 2023-08-01T19:50:03.787

Overview

LIKE, SIMILAR TO and ~ are the basic pattern matching operators in PostgreSQL.

If you can, use LIKE (~~), it's fastest and simplest.
If you can't, use a regular expression (~), it's more powerful.
Never user ~~SIMILAR TO~~. It's pointless. See below.

Installing the additional module pg_trgm adds advanced index options and the similarity operator %.
And there is also text search with its own infrastructure and the @@ operator (among others).

Index support is available for each of these operators - to a varying degree. It regularly trumps the performance of other options. But there is a lot of leeway in the details, even with indexes.

Index support

Without pg_trgm, there is only index support for left anchored search patterns. If your database cluster runs with a non-C locale (typical case), you need an index with a special operator class for that, like text_pattern_ops or varchar_pattern_ops. Basic left-anchored regular expressions are supported by this, too. Example:

CREATE TABLE tbl(string text);

INSERT INTO  tbl(string)
SELECT x::text FROM generate_series(1, 10000) x;

CREATE INDEX tbl_string_text_pattern_idx ON tbl(string text_pattern_ops);

SELECT * FROM tbl WHERE string ~ '^1234';  -- left anchored pattern

db<>fiddle here

With pg_trgm installed, GIN or GiST indexes are possible with the operator classes gist_trgm_ops or gin_trgm_ops. These indexes support any LIKE expression, not just left anchored. And, quoting the manual:

Beginning in PostgreSQL 9.3, these index types also support index searches for regular-expression matches.

Details:

PostgreSQL LIKE query performance variations

SIMILAR TO is a very odd construct. PostgreSQL only implements it because it was defined in early versions of the SQL standard. Internally, every SIMILAR TO expression is rewritten with a regular expression. Therefore, for any given SIMILAR TO expression, there is at least one regular expression doing the same job faster. I never use SIMILAR TO.

Further reading:

Pattern matching with LIKE, SIMILAR TO or regular expressions

I read recently that `SIMILAR TO` might be eliminated from the next version of the ANSI SQL standard (in favor of REGEXP_LIKE) anyway. And I'm not sure that the gains in portability from using `SIMILAR TO` over `~` are worth it since other DBMSs (Oracle, SQL Server, MySQL) don't support the standard. — David Faber, Sep 17 '12 at 13:11

Craig Ringer · Answer 3 · 2012-09-18T00:03:50.163

There is nothing wrong with LIKE and, IMO, no reason to favour ~ over it. Rather the opposite. LIKE is SQL-standard. So is SIMILAR TO, but it isn't widely supported. PostgreSQL's ~ operator (or posix regular expression matching operator) isn't SQL standard.

For that reason, I prefer to use LIKE where it's expressive enough and I only use ~ when I need the power of full regular expressions. If I ever need to port databases it's one less thing that'll hurt. I've tended to use SIMILAR TO when LIKE isn't powerful enough, but after Erwin's comments I think I'll stop doing that and use ~ when LIKE doesn't do the job.

Also, PostgreSQL can use a b-tree index for prefix searches (eg LIKE 'TEST%') with LIKE or SIMILAR TO if the database is in the C locale or the index has text_pattern_ops. Contrary to what I wrote earlier, Pg can also use such an index for a left-anchored posix regex, it just needs an explicit '^TEST.*' so the regex can only match from the beginning. My post earlier incorrectly stated that ~ couldn't use an index for a prefix search. With that difference eliminated it's really down to whether you want to stick to standard compliant features where possible or not.

See this demo SQLFiddle; note the different execution plans. Note the difference between ~ '1234.*' and ~ '^1234.*'.

Given sample data:

create table test (
   blah text
);
insert into test (blah)  select x::text from generate_series(1,10000) x;
create index test_blah_txtpat_idx ON test(blah text_pattern_ops);

note that ~ uses a seqscan even when it's substantially more expensive (artifically so due to enable_seqscan) because it has no alternative, while LIKE uses the index. However, a corrected ~ with a left anchor uses the index too:

regress=# SET enable_seqscan = 'f';
SET
regress=# explain select 1 from test where blah ~ '12.*';
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Seq Scan on test  (cost=10000000000.00..10000000118.69 rows=2122 width=0)
   Filter: (blah ~ '12.*'::text)
(2 rows)
regress=# explain select 1 from test where blah like '12%';
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=4.55..46.76 rows=29 width=0)
   Filter: (blah ~~ '12%'::text)
   ->  Bitmap Index Scan on test_blah_txtpat_idx  (cost=0.00..4.54 rows=29 width=0)
         Index Cond: ((blah ~>=~ '12'::text) AND (blah ~<~ '13'::text))
(4 rows)
regress=# explain select 1 from test where blah ~ '^12.*';
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Bitmap Heap Scan on test  (cost=5.28..51.53 rows=101 width=0)
   Filter: (blah ~ '^12.*'::text)
   ->  Bitmap Index Scan on test_blah_txtpat_idx  (cost=0.00..5.25 rows=100 width=0)
         Index Cond: ((blah ~>=~ '12'::text) AND (blah ~<~ '13'::text))
(4 rows)

Personally I disagree on `SIMILAR TO`. I wrote more on that in my answer. Also, it's worth mentioning, that the very common left-anchored search patterns are supported even for the regular expression operator `~`. — Erwin Brandstetter, Sep 17 '12 at 13:03
@ErwinBrandstetter Ah, I forgot to anchor it left. Will correct my post. — Craig Ringer, Sep 17 '12 at 13:06

score 13 · Answer 4 · answered Aug 30 '18 at 15:15

Like is just matching a part of the string either at the beginning or End or Middle And tilt (~) is matching with regex

To explain this further let's create a table and insert some values

# create table users(id serial primary key, name character varying);

Now let's insert some values in the table

# insert into users (name) VALUES ('Alex'), ('Jon Snow'), ('Christopher'), ('Arya'),('Sandip Debnath'), ('Lakshmi'),('alex@gmail.com'),('@sandip5004'), ('lakshmi@gmail.com');

Now your table should look like this

 id |       name        
----+-------------------
  1 | Alex
  2 | Jon Snow
  3 | Christopher
  4 | Arya
  5 | Sandip Debnath
  6 | Lakshmi
  7 | alex@gmail.com
  8 | lakshmi@gmail.com
  9 | @sandip5004

Case LIKE

# select * from users where name like 'A%';
 id | name 
----+------
  1 | Alex
  4 | Arya
(2 rows)

As you can see 'A%' will only get us the values whose name starts with capital A.

# select * from users where name like '%a%';
 id |       name        
----+-------------------
  4 | Arya
  5 | Sandip Debnath
  6 | Lakshmi
  7 | alex@gmail.com
  8 | lakshmi@gmail.com

As you can see '%a%' will only get us the values whose name has a in between the name.

# select * from users where name like '%a';

 id | name 
----+------
  4 | Arya

As you can see '%a' will only get us the values whose name ends with a.

Case ~ (tilt)

# select * from users where name ~* 't';
 id |      name      
----+----------------
  3 | Christopher
  5 | Sandip Debnath

As you can see name ~* 't' will only get us the values whose name has t. ~ means case sensitive and ~* means case insensitive so

# select * from users where name ~ 'T';
 id | name 
----+------
(0 rows)

the above query gave us 0 rows as T was not matching with any entries

Now let's consider a case where we only need to fetch the email ids and we don't know what the mail ids have, but we know the pattern of email i.e there will be some letter or number or _ or . or - and then @ and then some more letter or number or - then . then com or in or org etc and we can create the pattern using regular expression.

now let's try to fetch results using regular expression

# select * from users where name ~* '[a-z0-9\.\-\_]+@[a-z0-9\-]+\.[a-z]{2,5}';
 id |       name        
----+-------------------
  7 | alex@gmail.com
  8 | lakshmi@gmail.com

Similarly we can fetch some names which has a space in between

#select * from users where name ~* '[a-z]+\s[a-z]+';
 id |      name      
----+----------------
  2 | Jon Snow
  5 | Sandip Debnath

[a-z]+ means there can be any letter from a to z and + means it might occur 1 or more times and \s means after that there will be a space in between and then again a set of letters which can occur 1 or more times.

Hope this detailed analysis helps.

score 7 · Answer 5 · answered Sep 17 '12 at 02:15

7

The ~~ operator is equivalent to LIKE. ~, on the other hand, will match using a POSIX regular expression.

answered Sep 17 '12 at 02:15

3

Just an opinion (and thus a comment), but I dislike the ``~~`` operator, because it's Postgres-specific and loses SQL portability for no real gain. – asthasr Sep 17 '12 at 02:17
3

@syrion: Sorry for being unclear, I completely agree—I'm just saying that `~` doesn't do the same thing as `LIKE` does. – Sep 17 '12 at 02:20
@syrion And it's not as easier to search for than `LIKE`, e.g. I had to type `tilde tilde` on Google to get here – Dorian Mar 03 '17 at 19:24

Alex · Answer 6 · 2015-03-06T20:04:54.613

I just did a quick and simple benchmark to look at the performance difference between the two operators when no indexes are involved:

postgres=# \timing
Timing is on.
postgres=# SELECT count(1) FROM (SELECT val from generate_series(1, 10000000) x(val) WHERE val::text LIKE '%5%') AS x;
  count
─────────
 5217031
(1 row)

Time: 5631.662 ms
postgres=# SELECT count(1) FROM (SELECT val from generate_series(1, 10000000) x(val) WHERE val::text ~ '5') AS x;
  count
─────────
 5217031
(1 row)

Time: 10612.406 ms

In this example the LIKE operator is almost twice as fast as the ~ operator. So if speed is of the essence I would lean towards LIKE, though be careful not to optimize prematurely. ~ gives you a lot more flexibility.

For those of you who are interested, here are EXPLAIN plans for the above queries:

postgres=# EXPLAIN ANALYZE SELECT count(1) FROM (SELECT val from generate_series(1, 10000000) x(val) WHERE val::text LIKE '%5%') AS x;
                                                              QUERY PLAN
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Aggregate  (cost=20.00..20.01 rows=1 width=0) (actual time=9967.748..9967.749 rows=1 loops=1)
   ->  Function Scan on generate_series x  (cost=0.00..17.50 rows=1000 width=0) (actual time=1732.084..7404.755 rows=5217031 loops=1)
         Filter: ((val)::text ~~ '%5%'::text)
         Rows Removed by Filter: 4782969
 Total runtime: 9997.587 ms
(5 rows)

postgres=# EXPLAIN ANALYZE SELECT count(1) FROM (SELECT val from generate_series(1, 10000000) x(val) WHERE val::text ~ '5') AS x;
                                                              QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Aggregate  (cost=20.00..20.01 rows=1 width=0) (actual time=15118.061..15118.061 rows=1 loops=1)
   ->  Function Scan on generate_series x  (cost=0.00..17.50 rows=1000 width=0) (actual time=1724.591..12516.996 rows=5217031 loops=1)
         Filter: ((val)::text ~ '5'::text)
         Rows Removed by Filter: 4782969
 Total runtime: 15147.950 ms
(5 rows)

can you provide the explain plan to see if the explanation for the delta is consistent with the explanation Craig Ringer is showing above? — eharik, Mar 06 '15 at 19:42
So with an index-less test data set like what you're presented here your actually just confirming what Erwin is saying. Namely that ~~ is just a faster operator than ~. — eharik, Mar 06 '15 at 20:04

score 3 · Answer 7 · edited Apr 13 '17 at 12:42

Yes, it stands for POSIX regex. Another alternative is to use the SQL standard approach to regular expressions with the "SIMILAR TO" operator,though it provides a more limited set of features,might be easier to understand. I think this is a good reference from dba exchange: https://dba.stackexchange.com/questions/10694/pattern-matching-with-like-similar-to-or-regular-expressions-in-postgresql

Difference between LIKE and ~ in Postgres

7 Answers7

Overview

Index support

Case LIKE

Case ~ (tilt)

Linked