1

I am trying to join two big tables; lets call them tableA (150million rows) and tableB (140 million rows). Following query returns 490 rows and takes about 20-25s to execute (which is not acceptable).

select distinct
    a.key_fk
from tableA a 
join tableB b on a.key_fk = b.key_fk 
where customFunc(b.valueX) = 'xyz' 
and customFunc(a.valueY) = 'abc';
  • a.key_fk and b.key_fk are foreign keys referencing another table c (but that is not relevant)
  • both tables have indexes on key_fk
  • tableA has an index on customFunc(valueY) and tableB has an index on customFunc(valueX)
  • select count(key_fk) from tableA where customFunc(valueY)='abc' takes about 7-8s and returns 5million rows
  • select count(key_fk) from tableB where customFunc(valueX)='xyz' takes about 0,5s and returns 80k rows

Is there anything else I can do to improve the performance of mentioned query?

Filburt
  • 17,626
  • 12
  • 64
  • 115
user123454321
  • 1,028
  • 8
  • 26

3 Answers3

0

Possibly, creating indexes that include all the columns that are needed by the query would improve the speed, since it could avoid reading the actual table rows at all. You may need to experiment with the column order in the indexes to see what works best (or it may not help at all).

Given your description, I suspect the query is first accessing rows in tableB using the function index, then looking up rows in tableA based on the join column, then filtering by the last condition. This is just a guess, it would be helpful to see the actual execution plan.

If this guess is correct, creating an index on tableA ( key_fk, customFunc(valueY) ) should allow it to do a more precise index lookup and avoid reading rows from tableA at all.

Dave Costa
  • 47,262
  • 8
  • 56
  • 72
0

I would rewrite the query (no need to join ALL records):

select key_fk
from   tableA a 
where  customFunc(valueY) = 'abc'
intersect
select key_fk
from   tableB
where  customFunc(valueX) = 'xyz'

Another point to speed it up is to follow the Dave Costa recommendations - creating the indexes which include all the required columns. I would create tableA(customFunc(valueY), key_fk) and tableB(customFunc(valueX), key_fk)

Hope that helps.

Maxim Borunov
  • 901
  • 5
  • 9
0

Define on both tables index based on the function value and the FK (assuming the function is deterministic).

create index idxA on tableA (customFunc(valueX), key_fk);
create index idxB on tableB (customFunc(valueY), key_fk);

You will see an execution plan as follows accessing only the index range scan (no table access) and performing a hash join, which should lead to acceptable performace:

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |     50 |00:00:00.08 |    1484 |       |       |          |
|   1 |  HASH UNIQUE       |      |      1 |    138K|     50 |00:00:00.08 |    1484 |    11M|  3061K|          |
|*  2 |   HASH JOIN        |      |      1 |    140K|   3155 |00:00:00.01 |    1484 |  2168K|  2168K| 1958K (0)|
|*  3 |    INDEX RANGE SCAN| IDXB |      1 |    140K|   8489 |00:00:00.01 |      27 |       |       |          |
|*  4 |    INDEX RANGE SCAN| IDXA |      1 |    150K|    500K|00:00:00.07 |    1457 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("A"."KEY_FK"="B"."KEY_FK")
   3 - access("B"."SYS_NC00003$"='xyz')
   4 - access("A"."SYS_NC00003$"='abc')
Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53