0

I want some help to optimize this SQL Query. This is working completely fine. I just want to reduce the runtime of this query

select distinct 
o.usrp_order_number,t.* 
from ms_bvoip_order_extension oe
 inner join ms_order o on oe.ms_order_id = o.ms_order_id
 inner join ms_sub_order so on so.ms_order_id = o.ms_order_id
 inner join ms_job j on j.entity_id = so.ms_sub_order_id
  left join mstask t ON t.wf_job_id = j.wf_job_id
  where
  o.order_type = 900
  and o.entered_date between date_sub(current_date(),53) and
 date_sub(current_date(),3)
  and j.entity_type = 5 and t.name RLIKE 'Error|Correct|Create AOTS Ticket' and t.wf_job_id is not null
  order by
  o.usrp_order_number
leftjoin
  • 36,950
  • 8
  • 57
  • 116

3 Answers3

1

The WHERE conditions are being executed after joins in Hive (though CBO and PPD may change this behavior), better study the EXPLAIN output for both queries. You can move conditions like this: o.order_type = 900 to the join ON clause to reduce rows on join. Only non-equi conditions involving both tables columns are not allowed in the join ON clause in Hive. Also table t is left-joined, but conditions in the where: t.name RLIKE 'Error|Correct|Create AOTS Ticket' and t.wf_job_id is null and t.ORIGINAL_START_DATE is not null transform left join to the inner join. Check do you need INNER or LEFT JOIN

select distinct 
o.usrp_order_number,t.* 
from ms_bvoip_order_extension oe
 inner join ms_order o 
    on oe.ms_order_id = o.ms_order_id
       and o.order_type = 900
       and and o.entered_date between date_sub(current_date(),53) and date_sub(current_date(),3)                 
 inner join ms_sub_order so on so.ms_order_id = o.ms_order_id
 inner join ms_job j on j.entity_id = so.ms_sub_order_id 
                    and j.entity_type = 5
 left join mstask t on t.wf_job_id = j.wf_job_id 
                    and t.name RLIKE 'Error|Correct|Create AOTS Ticket' 
                    and t.wf_job_id is null
                    and t.ORIGINAL_START_DATE is not null 
order by o.usrp_order_number

Also read this answer about configuration settings: https://stackoverflow.com/a/48487306/2700344

leftjoin
  • 36,950
  • 8
  • 57
  • 116
0

be sure you have proper index on

table ms_order a composite index on columns entered_date, order_type, ms_order_id

table ms_job a composite index on columns entity_type, entity_id

table mstask a composite index on columns wf_job_id, ORIGINAL_START_DATE

table ms_sub_order an index on column ms_order_id

table ms_bvoip_order_extension and index on column ms_order_id

ScaisEdge
  • 131,976
  • 10
  • 91
  • 107
0

You will need to add indexes for the columns that you are filtering by.

We do not know how many records each of your tables are holding, but the t.name RLIKE criteria should be evaluated as the very last item. I would rewrite your query based on the following idea:

select ...
from
(
    select ...
    inner join ...
    inner join ...
    inner join ...
    left join ...
    where ...
) temporary
where temporary.somename RLIKE 'Error|Correct|Create AOTS Ticket'
o.usrp_order_number

If the query is not very dynamic, then you can even cache the results for a while.

Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175