3

While executing below two queries, I notice serious difference in query plan. Why is that?

select * from table1
where id = 'dummy' or id in (select id from table2 where id = 'dummy')

Query plan

Seq Scan on table1  (cost=8.30..49611.63 rows=254478 width=820) (actual time=535.477..557.431 rows=1 loops=1)
  Filter: (((code)::text = 'dummy'::text) OR (hashed SubPlan 1))
  Rows Removed by Filter: 510467
  SubPlan 1
    ->  Index Scan using idx on table2  (cost=0.29..8.30 rows=1 width=8) (actual time=0.009..0.012 rows=0 loops=1)
          Index Cond: ((id)::text = 'dummy'::text)
Planning Time: 0.165 ms
Execution Time: 557.517 ms
select * from table1
where id = 'dummy' 
union 
select * from table1
where id in (select id from table2 where id = 'dummy')
Unique  (cost=25.22..25.42 rows=2 width=5818) (actual time=0.045..0.047 rows=1 loops=1)
  ->  Sort  (cost=25.22..25.23 rows=2 width=5818) (actual time=0.045..0.046 rows=1 loops=1)
                Sort Method: quicksort  Memory: 25kB
        ->  Append  (cost=0.42..25.21 rows=2 width=5818) (actual time=0.016..0.026 rows=1 loops=1)
              ->  Index Scan using id on table1  (cost=0.42..8.44 rows=1 width=820) (actual time=0.015..0.016 rows=1 loops=1)
                    Index Cond: ((id)::text = 'dummy'::text)
              ->  Nested Loop  (cost=0.71..16.74 rows=1 width=820) (actual time=0.009..0.009 rows=0 loops=1)
                    ->  Index Scan using idx on table2  (cost=0.29..8.30 rows=1 width=8) (actual time=0.008..0.008 rows=0 loops=1)
                          Index Cond: ((id)::text = 'dummy'::text)
                    ->  Index Scan using pkey on table1  (cost=0.42..8.44 rows=1 width=820) (never executed)
                          Index Cond: (id = table2.id)
Planning Time: 0.753 ms
Execution Time: 0.131 ms

So the main difference you can see is the first query returns 254478 rows but the second just returns 2 rows. Why is that?

  • Union does a `distinct` on the data result; unless you use `UNION ALL` OR will return every record including duplicates – xQbert Jul 06 '22 at 16:54
  • @xQbert but why does the first query returns 200,000 rows? How does psql internally run ```where id = 'dummy' or id in (select id from table2 where id = 'dummy')``` – Spring boot progammer Jul 06 '22 at 16:56
  • `mysql`<>`postgresql` – Shmiel Jul 06 '22 at 16:56
  • add UNION ALL to your union statement do you get the same count? or add DISTINCT to your first query do you get 2 records? First I want to make sure the results are in fact the same – xQbert Jul 06 '22 at 16:57
  • @xQbert Still returns 2 rows with union all. – Spring boot progammer Jul 06 '22 at 16:58
  • @Springbootprogammer I understand your confusion now. The result of the subquery is an ID that equals 'dummy' you're already searching for dummy. there are only 2 records with dummy in table 1 as it's Id. so why is or seemingly resulting in a Cartesian product between table1 and table 2? Yeah I'd need a example set of data on a dbfiddle and prove this out on ... whatever engine this is... I can't see how the SQL would do this – xQbert Jul 06 '22 at 17:01
  • yes @xQbert . How does psql internally run the ```condition OR (subplan)``` – Spring boot progammer Jul 06 '22 at 17:03
  • my understanding is it materializes a list of the ID's = 'dummy' so it would return ('dummy','dummy','dummy') if you had 3 in table2. Then each record in table 1 is checked agasint 2 conditions id = dummy or ID in ('dummy','dummy',dummy') So this results is quite odd... i'd love to see sample data and it recreated on a dbfiddle – xQbert Jul 06 '22 at 17:05
  • That is not how optimization on SQL works -- you can't ask how `condition OR (subplan)` works because what isi n the condition or what is in the subplan will change the whole plan – Hogan Jul 06 '22 at 17:05
  • you have both mysql and postgresql in your tags it can't be both so I deleted them. Let us know when you figure out what system you are using. – Hogan Jul 06 '22 at 17:09
  • @Hogan Edited the tags, thought the query plan are mostly same in all system so sql guys might help – Spring boot progammer Jul 06 '22 at 17:10
  • No they are different on every system. If you want all "sql guys" then you can use the SQL tag or ansi-sql tag. But in most cases you are going to get better answers if you tag the platform you are using. – Hogan Jul 06 '22 at 17:11
  • @Springbootprogammer in actually depending upon indexes and data volume the exact same query can generate a different execution plan in the same database on the same RDBMS on different runs. – Belayer Jul 06 '22 at 17:56
  • are you able to recreate this situation on a dbfiddle somewhere? Like: https://dbfiddle.uk/?rdbms=postgres_14 for example? – xQbert Jul 06 '22 at 19:38
  • @Springbootprogammer I'm not able to recreate your scenairo: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=d2e5b9239aa25b09b715b9129cd46e2d which makes me believe it's environmental and not SQL related... what version of postgresql? – xQbert Jul 06 '22 at 19:47
  • 1
    They both return 1 row. 254478 is an *estimate*, apparently a very bad one. – jjanes Jul 06 '22 at 22:28

1 Answers1

4

Please do another test -- run both these queries -- do they give the same results as the queries without my changes?

 select * from table1
 where table1.id = 'dummy' or 
       table1.id in (select table2.id from table2 where table2.id = 'dummy')


 select * from table1
 where table1.id = 'dummy' 
 union 
 select * from table1
 where table1.id in (select table2.id from table2 where table2.id = 'dummy')

I don't think you are sharing your actual code with us -- because as written your code makes little sense -- you are returning a list of ids in the sub-query that equal 'dummy' -- so you will just get a list of dummy multiple times.


Note these comments are not true since they had no impact on the results -- the order of operations was working as expected

What result do you get when when you do this:

select * from table1
where (id = 'dummy') or id in (select id from table2 where id = 'dummy')

The reason your query was giving more results is because it was selecting records from table1 where id equals dummy or id = id. The query in the original post gives you all the records. The OR was being applied to the first expression not splitting two expressions.

Hogan
  • 69,564
  • 10
  • 76
  • 117