I have a very slow query due to scanning through millions of records. The query searches for how many numbers are in a specific range.
I have 2 tables: numbers_in_ranges and person table
Create table numbers_in_ranges
( range_id number(9,0) ,
begin_range number(9,0),
end_range number(9,0)
) ;
Create table person
(
id integer,
a_number varchar(9),
first_name varchar(25),
last_name varchar(25)
);
Data for numbers_in_ranges
range_id| begin_range | end_range
--------|------------------------
101 | 100000000 | 200000000
102 | 210000000 | 290000000
103 | 350000000 | 459999999
104 | 461000000 | 569999999
106 | 241000000 | 241999999
e.t.c.
Data for person
id | a_number | first_name | last_name
---|------------|------------|-----------
1 | 100000001 | Maria | Doe
2 | 100000999 | Emily | Davis
3 | 150000000 | Dave | Smith
4 | 461000000 | Jane | Jones
6 | 241000001 | John | Doe
7 | 100000002 | Maria | Doe
8 | 100009999 | Emily | Davis
9 | 150000010 | Dave | Smith
10 | 210000001 | Jane | Jones
11 | 210000010 | John | Doe
12 | 281000000 | Jane | Jones
13 | 241000000 | John | Doe
14 | 460000001 | Maria | Doe
15 | 500000999 | Emily | Davis
16 | 550000010 | Dave | Smith
17 | 461000010 | Jane | Jones
18 | 241000020 | John | Doe
e.t.c.
We are getting the range data from a remote database via a database link and storing it in a materialized view.
The query
select nums.range_id, count(p. a_number) as a_count
from number_in_ranges nums
left join person p on to_number(p. a_number)
between nums.begin_range and nums.end_range
group by nums.range_id;
The result looks like
range_id| a_count
--------|------------------------
101 | 6
102 | 5
103 | 2
104 | 3
e.t.c
As I said, this query is very slow.
Here is the explain plan
Plan hash value: 3785994407
---------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 9352 | 264K| | 42601 (31)| 00:00:02 | | | |
| 1 | PX COORDINATOR | | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10002 | 9352 | 264K| | 42601 (31)| 00:00:02 | Q1,02 | P->S | QC (RAND) |
| 3 | HASH GROUP BY | | 9352 | 264K| | 42601 (31)| 00:00:02 | Q1,02 | PCWP | |
| 4 | PX RECEIVE | | 9352 | 264K| | 42601 (31)| 00:00:02 | Q1,02 | PCWP | |
| 5 | PX SEND HASH | :TQ10001 | 9352 | 264K| | 42601 (31)| 00:00:02 | Q1,01 | P->P | HASH |
| 6 | HASH GROUP BY | | 9352 | 264K| | 42601 (31)| 00:00:02 | Q1,01 | PCWP | |
| 7 | MERGE JOIN OUTER | | 2084M| 56G| | 37793 (23)| 00:00:02 | Q1,01 | PCWP | |
| 8 | SORT JOIN | | 9352 | 173K| | 3 (34)| 00:00:01 | Q1,01 | PCWP | |
| 9 | PX BLOCK ITERATOR | | 9352 | 173K| | 2 (0)| 00:00:01 | Q1,01 | PCWC | |
| 10 | MAT_VIEW ACCESS FULL | NUMBERS_IN_RANGES | 9352 | 173K| | 2 (0)| 00:00:01 | Q1,01 | PCWP | |
|* 11 | FILTER | | | | | | | Q1,01 | PCWP | |
|* 12 | SORT JOIN | | 89M| 850M| 2732M| 29681 (1)| 00:00:02 | Q1,01 | PCWP | |
| 13 | BUFFER SORT | | | | | | | Q1,01 | PCWC | |
| 14 | PX RECEIVE | | 89M| 850M| | 4944 (1)| 00:00:01 | Q1,01 | PCWP | |
| 15 | PX SEND BROADCAST | :TQ10000 | 89M| 850M| | 4944 (1)| 00:00:01 | Q1,00 | P->P | BROADCAST |
| 16 | PX BLOCK ITERATOR | | 89M| 850M| | 4944 (1)| 00:00:01 | Q1,00 | PCWC | |
| 17 | INDEX FAST FULL SCAN| PERSON_AN_IDX | 89M| 850M| | 4944 (1)| 00:00:01 | Q1,00 | PCWP | |
---------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
11 - filter("NUMS"."END_RANGE">=TO_NUMBER("P"."A_NUMBER"(+)))
12 - access("NUMS"."BEGIN_RANGE"<=TO_NUMBER("P"."A_NUMBER"(+)))
filter("NUMS"."BEGIN_RANGE"<=TO_NUMBER("P"."A_NUMBER"(+)))
Note
-----
- automatic DOP: Computed Degree of Parallelism is 16 because of degree limit
I tried to run the deltas
for the month and then append them to the table, like:
if new range_id
is found then
insert
if range_id is found then
update
So we don't have to scan the whole table.
But this solution didn't work because some ranges are updated, and splicing happens, for example:
We create a new range_id = 110
with ranges between 100110000
and 210000001
then range_id = 101
is spliced to 100000000
and 100110000
and range_id = 102
is spliced to 100110001
and 210000000
;
Now I thought of creating a trigger for when a new range is created or updated to update that table; however, that is impossible since we are getting this data from a remote database that stores the data into a Materialized View, and we cannot put a trigger on a read-only materialized view.
My question is there any other way that I can do this or optimize this query?
Thank you!