Query with UNION or OR?

Question

I have a table friend used to store the relationship between two user.

For example: (1,2) means user1 and user2 are friends. (2,1) means the same, but there we won't store that, making uid1 < uid2 manually:

CREATE TABLE public.friend (
  uid1 INTEGER,
  uid2 INTEGER
);
CREATE INDEX index_uid1 ON friend USING BTREE (uid1);
CREATE INDEX index_uid2 ON friend USING BTREE (uid2);

To find the friends of uid=2, I can use:

Sql1:

select * from friend where uid1=2
union
select * from friend where uid2=2;

Sql2:

select * from friend uid1=2 or uid2=2

What I get is that sql2 is better than sql1 in performance.

But sql1 is recommended. Is that correct?

Why do you say that "SQL1 is recommended"? Why do you say that "SQL2 is beter in performance"? — MatBailie, May 17 '18 at 12:48
Then show us the article. We can't comment on why an article says something without reading the article. — MatBailie, May 17 '18 at 12:52

score 2 · Answer 1 · answered May 17 '18 at 12:49

2

They are not technically the same, the union operator will remove duplicates, while the 2nd example does not.

answered May 17 '18 at 12:49

Andrew

26,629
5
63
86

score 1 · Answer 2 · answered May 17 '18 at 12:50

union incurs overhead for removing duplicates. Perhaps the most efficient method for the query is:

select f.* from friend f where uid1 = 2
union all
select f.* from friend f where uid2 = 2 and uid1 <> 2;

In particular, this can take advantage of indexes on f(uid1) and f(uid2). Your second version is probably doing a full table scan.

Erwin Brandstetter · Accepted Answer · 2018-05-17T14:16:47.443

I have a table friend used to store the relationship between two user.

For example: (1,2) means user1 and user2 are friends. (2,1) means the same, but there we won't store that, making uid1 < uid2 manually:

Typically, you would implement that with have a PRIMARY KEY on (udi1, uid2) and a CHECK constraint enforcing uid1 < uid2.

CREATE TABLE public.friend (
   uid1 integer
 , uid2 integer
 , PRIMARY KEY (uid1, uid2)
 , CONSTRAINT uid2_gt_uid1 CHECK (uid1 < uid2)
);

CREATE INDEX index_uid2 ON friend USING BTREE (uid2);

You don't need the other index, it's covered by the index of the PK;

CREATE INDEX index_uid1 ON friend USING BTREE (uid1);

Then there cannot be duplicates (including switched duplicates) and nobody can be friend with him/her self either, and your query can simply be:

SELECT * FROM friend WHERE 2 IN (uid1, uid2);

... which is short for:

SELECT * FROM friend WHERE uid1 = 2 OR uid2 = 2;

And the UNION variant is now logically identical:

SELECT * FROM friend WHERE uid1=2
UNION
SELECT * FROM friend WHERE uid2=2;

But you would UNION ALL instead of just UNION as there are no duplicates to begin with and UNION ALL is cheaper. But still slightly more expensive than the single SELECT above.

Duplicates?

There are three possible sources of duplicates in the UNION ALL query:

Duplicate rows in the underlying table (ruled out by the PK).
Rows fetched multiple times by more than one SELECT branch.
In your particular case: logical duplicates with switched IDs (ruled out by the CHECK constraint).

Once you understand this, you also understand the implications of the different query techniques. With the suggest setup, only 2. remains as possible issue.

Query with UNION or OR?

3 Answers3

Duplicates?