9

Can someone explain the performance difference between these 3 queries?

concat() function:

explain analyze 
select * from person 
where (concat(last_name, ' ', first_name, ' ', middle_name) like '%Ива%');

Seq Scan on person  (cost=0.00..4.86 rows=1 width=15293) (actual time=0.032..0.140 rows=6 loops=1)
  Filter: (pg_catalog.concat(last_name, ' ', first_name, ' ', middle_name) ~~ '%Ива%'::text)
Total runtime: 0.178 ms

SQL standard concatenation with ||:

explain analyze 
select * from person 
where ((last_name || ' ' || first_name || ' ' || middle_name) like '%Ива%');

Seq Scan on person  (cost=0.00..5.28 rows=1 width=15293) (actual time=0.023..0.080 rows=6 loops=1)
  Filter: ((((((last_name)::text || ' '::text) || (first_name)::text) || ' '::text) || (middle_name)::text) ~~ '%Ива%'::text)
Total runtime: 0.121 ms

Search fields separately:

explain analyze 
select * from person 
where (last_name like '%Ива%') or (first_name like '%Ива%') or (middle_name like '%Ива%');

Seq Scan on person  (cost=0.00..5.00 rows=1 width=15293) (actual time=0.018..0.060 rows=6 loops=1)
  Filter: (((last_name)::text ~~ '%Ива%'::text) OR ((first_name)::text ~~ '%Ива%'::text) OR ((middle_name)::text ~~ '%Ива%'::text))
Total runtime: 0.097 ms

Why is concat() slowest one and why are several like conditions faster?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Yegor Koldov
  • 252
  • 1
  • 4
  • 16
  • 1
    What exactly is your question? –  Apr 13 '15 at 06:48
  • Why does `pg.concat` is slowest one and several `like` is faster? – Yegor Koldov Apr 13 '15 at 06:49
  • @ma3a try `last_name||' '||first_name||' '||middle_name`(i guess in your case manual concatenation is better than concat()) – Vivek S. Apr 13 '15 at 06:53
  • @vivek see second query, it's a bit faster than pg.concat, and it's strange, cause first and second query almost the same. It's not production code, i just interested in why this queries running slow – Yegor Koldov Apr 13 '15 at 06:57
  • @ma3a because `concat()` is _function_ when you execute your select postgres call that `concat()` for the operation – Vivek S. Apr 13 '15 at 06:58
  • @vivek, so this is overhead for calling function on each row? – Yegor Koldov Apr 13 '15 at 06:59
  • 1
    @ma3a In your case it's an overhead – Vivek S. Apr 13 '15 at 07:01
  • Re-run those statements in a different order. The differences in the runtimes are small enough to be caused by caching of the underlying data. –  Apr 13 '15 at 07:04
  • @a_horse_with_no_name, i've made rerun in different order(reverse), and there is no changes: 0.096, 0.120, 0.182 – Yegor Koldov Apr 13 '15 at 07:08
  • 2
    A test on a table with only 6 rows doesn't mean a lot. And this query is so fast that any bit fart of your computer could be responsible for the differences in timing. – Frank Heikens Apr 13 '15 at 07:14
  • @Frank Heikens table not 6 rows, `select count(*) from person` produce `43` – Yegor Koldov Apr 13 '15 at 07:28
  • 1
    @ma3a: And does that make any difference? It's still an almost empty table.... Create a million records and run your tests again. A sequential scan that runs within a millisecond can never be "slow", no matter what. – Frank Heikens Apr 13 '15 at 07:30

3 Answers3

8

While not a concrete answer, the following might help you to reach some conclusions:

  1. Calling concat to concatenate the three strings, or using the || operator, results in postgres having to allocate a new buffer to hold the concatenated string, then copy the characters into it. This has to be done for each row. Then the buffer has to be deallocated at the end.

  2. In the case where you are ORing together three conditions, postgres may only have to evaluate only one or maybe two of them to decide if it has to include the row.

  3. It is possible that expression evaluation using the || operator might be more efficient, or perhaps more easily optimizable, compared with a function call to concat. I would not be surprised to find that there is some special case handling for internal operators.

  4. As mentioned in the comments, your sample is too small to make proper conclusions anyway. At the level of a fraction of a millisecond, other noise factors can distort the result.

harmic
  • 28,606
  • 5
  • 67
  • 91
  • "of a few hundred milliseconds", the test cases are even worse, all tests ran within a millisecond, the slowest test took just 0.178 millisecond. – Frank Heikens Apr 13 '15 at 08:25
3

What you have observed so far is interesting but hardly important. Minor cost overhead for concatenating strings.

The much more important difference between these expressions does not show in your minimal test case without indexes.

The first two examples are not sargable (unless you build a tailored expression index):

where concat(last_name, ' ', first_name, ' ', middle_name) like '%Ива%'
where (last_name || ' ' || first_name || ' ' || middle_name) like '%Ива%'

While this one is:

where last_name like '%Ива%' or first_name like '%Ива%' or middle_name like '%Ива%'

I.e., it can use a plain trigram index to great effect (order of columns is unimportant in a GIN index):

CREATE INDEX some_idx ON person USING gin (first_name  gin_trgm_ops
                                         , middle_name gin_trgm_ops
                                         , last_name   gin_trgm_ops);

Instructions:

Incorrect test if NULL is possible

concat() is generally slightly more expensive than simple string concatenation with ||. It is also different: If any of the input strings is NULL, the concatenated result is also NULL in your second case, but not in your first case, since concat() just ignores NULL values - but you'd still get a useless space character in the result.

Detailed explanation:

If you are looking for a clean, elegant expression (about the same cost), use concat_ws() instead:

concat_ws( ' ', last_name, first_name, middle_name)
Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
1

This query has overhead for calling function on each row

explain analyze 
select * from person 
where (concat(last_name, ' ', first_name, ' ', middle_name) like '%Ива%');

this query is faster cause no additional operation executed

explain analyze 
select * from person 
where (last_name like '%Ива%') or (first_name like '%Ива%') or (middle_name like '%Ива%');
Yegor Koldov
  • 252
  • 1
  • 4
  • 16