2

I know a unique index will be faster than a non unique index theoretically if data is unique.

Because a unique index is able to provide more information and let the query optimizer choose more effective execution plan.

I am doing some testing and want to prove that a unique index could be better than a non unique index from the execution plan, but the result show me those are the same...

CREATE TABLE T3(
    ID INT NOT NULL,
    val INT NOT NULL,
    col1 UUID NOT NULL,
    col2 UUID NOT NULL,
    col3 UUID NOT NULL,
    col4 UUID NOT NULL,
    col5 UUID NOT NULL,
    col6 UUID NOT NULL
);

CREATE INDEX IX_ID_T3 ON T3 (ID);
CREATE UNIQUE INDEX UIX_ID_T3 ON T3 (ID);

INSERT INTO T3
SELECT i,
       RANDOM() * 1000000,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid,
       md5(random()::text || clock_timestamp()::text)::uuid
FROM generate_series(1,1000000) i;

vacuum ANALYZE T3;

I have created a table and two indexes (IX_ID_T3 is non unique,UIX_ID_T3 is unique) then inserted 1000000 sample rows.

After I inserted data I ran vacuum ANALYZE T3;

--drop index IX_ID_T3 

EXPLAIN (ANALYZE,TIMING ON,BUFFERS ON)
SELECT DISTINCT a1.ID
FROM T3 a1 INNER JOIN T3 a2
ON a1.id = a2.id
WHERE a1.id <= 300000

First query, I had tried to test between UIX_ID_T3 and IX_ID_T3 by Merge-Join

The Buffers: shared hit and execution plan are no different.

Here is my execution plan

-- UIX_ID_T3 
"Unique  (cost=0.85..41457.94 rows=298372 width=4) (actual time=0.030..267.207 rows=300000 loops=1)"
"  Buffers: shared hit=1646"
"  ->  Merge Join  (cost=0.85..40712.01 rows=298372 width=4) (actual time=0.030..200.412 rows=300000 loops=1)"
"        Merge Cond: (a1.id = a2.id)"
"        Buffers: shared hit=1646"
"        ->  Index Only Scan using uix_id_t3 on t3 a1  (cost=0.42..8501.93 rows=298372 width=4) (actual time=0.017..49.237 rows=300000 loops=1)"
"              Index Cond: (id <= 300000)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"        ->  Index Only Scan using uix_id_t3 on t3 a2  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.010..40.170 rows=300000 loops=1)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"Planning Time: 0.171 ms"
"Execution Time: 282.919 ms"

---IX_ID_T3 
"Unique  (cost=0.85..41420.43 rows=297587 width=4) (actual time=0.027..230.256 rows=300000 loops=1)"
"  Buffers: shared hit=1646"
"  ->  Merge Join  (cost=0.85..40676.46 rows=297587 width=4) (actual time=0.027..173.308 rows=300000 loops=1)"
"        Merge Cond: (a1.id = a2.id)"
"        Buffers: shared hit=1646"
"        ->  Index Only Scan using ix_id_t3 on t3 a1  (cost=0.42..8476.20 rows=297587 width=4) (actual time=0.015..41.606 rows=300000 loops=1)"
"              Index Cond: (id <= 300000)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"        ->  Index Only Scan using ix_id_t3 on t3 a2  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.009..34.019 rows=300000 loops=1)"
"              Heap Fetches: 0"
"              Buffers: shared hit=823"
"Planning Time: 0.195 ms"
"Execution Time: 243.711 ms"

There was another question are-unique-indexes-better-for-column-search-performance-pgsql-mysql to discuss with this topic.

I had also tried to test the answer of question query, but the execution plan are no different.

EXPLAIN (ANALYZE,TIMING ON,BUFFERS ON)
SELECT  id
FROM    T3
ORDER BY
        id
LIMIT 10;
-- using IX_ID_T3 
"Limit  (cost=0.42..0.68 rows=10 width=4) (actual time=0.034..0.036 rows=10 loops=1)"
"  Buffers: shared hit=4"
"  ->  Index Only Scan using uix_id_t3 on t3  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.033..0.034 rows=10 loops=1)"
"        Heap Fetches: 0"
"        Buffers: shared hit=4"
"Planning Time: 0.052 ms"
"Execution Time: 0.047 ms"

-- using IX_ID_T3
"Limit  (cost=0.42..0.68 rows=10 width=4) (actual time=0.026..0.029 rows=10 loops=1)"
"  Buffers: shared hit=4"
"  ->  Index Only Scan using ix_id_t3 on t3  (cost=0.42..25980.42 rows=1000000 width=4) (actual time=0.025..0.027 rows=10 loops=1)"
"        Heap Fetches: 0"
"        Buffers: shared hit=4"
"Planning Time: 0.075 ms"
"Execution Time: 0.043 ms"

I saw a lot of different articles but I can't prove that a unique index could be better than a non unique index through the execution plan.

Postgres unique constraint vs index

Question:

Can anyone prove a unique index could be better than a non unique index from an execution plan and show us the queries and executions plan?

To my knowledge from unique index of sql-server not only be a constraint but also can be better performance than non unique index.

The Many Mysteries of Merge Joins

D-Shih
  • 44,943
  • 6
  • 31
  • 51
  • To my knowledge there is no physical difference (on disk, data structures etc) between a non-unique index that stores unique values and a unique index. You can probably verify that using the [pageinspect](https://www.postgresql.org/docs/current/pageinspect.html#id-1.11.7.32.6) module. –  Jan 17 '22 at 07:49
  • 1
    "*I know a unique index will be faster than a non unique index*" - how do you know that? Can you give a reference where this is stated (especially for Postgres) –  Jan 17 '22 at 07:53
  • @a_horse_with_no_name From this question https://stackoverflow.com/questions/1293499/are-unique-indexes-better-for-column-search-performance-pgsql-mysql `In SQL Server and in PostgreSQL, for instance, if you sort on a UNIQUE key, the optimizer ignores the ORDER BY clauses used after that (since they are irrelevant)` – D-Shih Jan 17 '22 at 07:58
  • 1
    To be honest, I find this a rather strange question. Data must be unique (a business requirement) and you have to use a unique constraint (which is implemented as a unique index) or the data could be anything, unique or not unique. And thus you can't use a unique index. The choice between the two types if indexes depends on your data, not the speed of a query. – Frank Heikens Jan 17 '22 at 09:27
  • @FrankHeikens I agreed that uniqueness is a constraint for business requirements, but I want to know to expect business requirement is there any performance enhancement when I create a unique index? because unique index is not only beneficial for business but also benefit for performance (reduce logic read and friendly to QO) in sql-server which can prove by execution plan and logic read information. – D-Shih Jan 17 '22 at 09:45
  • Data is unique, or it is not. If you want to store only unique values in your database, you have to come up with a data model that supports this. You only have to do some data normalisation and you're done. However, I'm not sure if that would be faster in all cases. – Frank Heikens Jan 17 '22 at 10:31
  • @a_horse_with_no_name I had used `pageinspect` extension to compare non unique index & unique index blocks, they are storge same data in blocks (As you say), so I thought postgresql unique index only create for constraint. – D-Shih Jan 18 '22 at 06:08
  • @D-Shih Postgres will also skip the order by when sorting by a non-unique index: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=12b1ce98e5742996c2461cf4ee46a9fe –  Jan 18 '22 at 06:48
  • 1
    @D-Shih: so if you see that the data blocks are identical, then obviously the performance in retrieving them is identical as well. So you have proven that the unique index can not be faster –  Jan 18 '22 at 06:54

1 Answers1

7

An unique index won't be any faster to scan than a non-unique one. The only potential benefit in query execution speed could be that the optimizer can make certain deductions from the uniqueness and for example remove an unnecessary join.

The primary use of unique indexes is to implement table constraints, not to provide a performance advantage over non-unique indexes.

Here is an example:

CREATE TABLE parent (pid bigint PRIMARY KEY);

CREATE TABLE child (
   cid bigint PRIMARY KEY,
   pid bigint UNIQUE REFERENCES parent
);

EXPLAIN (COSTS OFF)
SELECT parent.pid FROM parent LEFT JOIN child USING (pid);

     QUERY PLAN     
════════════════════
 Seq Scan on parent
(1 row)

Without the unique constraint on child.pid (which is implemented by a unique index) the join could not be removed.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263