0

I have two tables, called "Orders" and "Holidays"

The "Orders" table

orderId |   orderDate
     1         02-03-2017
     2         02-03-2017
     3         02-03-2017
     4         02-03-2017
     5         04-03-2017
     6         04-03-2017
     7         04-03-2017
     8         04-03-2017



The "Holiday" table

holiId |       holiDate
     1         12-03-2017
     2         02-06-2017
     3         06-09-2017
     4         02-03-2017


Then, I have this query

SELECT orderId, orderDate, 
CASE WEHN hd.holiDate IS NOT NULL THEN 'HOLIDAY'
ELSE to_char(or.Day, 'DD-MON-YYYY') AS 'WEEKDAY'
FROM Orders or
LEFT JOIN Holiday hd
ON or.orderDate = hd.holiDate



This gives me sql results like the follwoings

orderId |   orderDate |       WEEKDAY
     1         02-03-2017        HOLIDAY
     2         02-03-2017        HOLIDAY
     3         02-03-2017        HOLIDAY
     4         02-03-2017        HOLIDAY
     5         04-03-2017        SATURDAY
     6         04-03-2017        SATURDAY
     7         04-03-2017        SATURDAY
     8         04-03-2017        SATURDAY


Basically, what I am doing is that I compare each row of the "Orders" table against the "Holiday" table by the "left join on" clause.
Then I mark each row as 'HOLIDAY' if there is any match. It works as intended but I am wondering if I can improve this query. In real situation, I would have a lot of rows having the same orderDate value.

When the database compares the first row of the "Orders" against the "Holiday" table, it will find that this current row matches one of the rows in "Holiday" table. So the database will mark it as 'HOLIDAY'.

Then, it will move to the next row and then do the same comparison operation against "Holiday" table.
Here is what I think "efficient way" can come in.

Now I already knew that 02-03-2017 is 'HOLIDAY' from the first-row's comparison. So I throught it would be more efficient to firstly compare current row's orderDate to previous one and then if they match, just use the previous row's WEEKDAY column's value. If they don't match, just do the comparison operation against the 'Holiday' table.

Is there any possible way of doing this??

Strawberry
  • 33,750
  • 13
  • 40
  • 57
Michael
  • 63
  • 2
  • 2
  • 9
  • This is oracle, right? – Strawberry May 26 '17 at 06:49
  • @Strawberry yeah I am using oracle – Michael May 26 '17 at 06:52
  • Just stick with what your using, you can compare other rows in the result set using `lag` and `lead` BUT this adds a more complex solution than to do the simple test you are currently using. – Nigel Ren May 26 '17 at 06:55
  • @Nigel Ren can you suggest any query with lag and lead used?? – Michael May 26 '17 at 06:59
  • Easier to google for them - something like https://gerardnico.com/wiki/database/oracle/lag_lead – Nigel Ren May 26 '17 at 07:05
  • @NigelRen yeah but when you say "result set", does it mean that the query already performed the left join query/? – Michael May 26 '17 at 11:13
  • I don't think it will affect the way the query runs - I think it will make the query even slower! It will still do the join, but once it has done this lag and lead can access data in other rows. – Nigel Ren May 26 '17 at 14:01
  • @NigelRen can I ask one thing? when I do the same query in oracle twice, do I get the same ordered results or can't I make sure? – Michael May 27 '17 at 00:54
  • The only way you can ensure that order is fixed is by using an `order by` clause. Without this, the database is free to return the records in whatever order it finds them in. – Nigel Ren May 27 '17 at 06:04

1 Answers1

0

Your analysis would have been right if it was any other database then Oracle. In Oracle fortunately you have Hash join to save the day for you.

When you say it is a big table question is what is definition of big in your case, how many rows/blocks of data and how many columns/rows (projection/selection) are we talking about.

In any case check below code with 30 million rows, with data columns being indexed just to make sure in case it doesn't do hash join and instead go for sort merger or nested loops join.

Your holiday table can't really be big because max you will have 20 to 30 holidays in a year unless counting weekends has holidays as well even then it is less than 100 rows table. So hash join will create the HASH on holidate column first then using those hashes it will scan the whole orders table in ONE go. Keyword here is ONE go, in other words it won't do a comparison as you are thinking line by line each time. Hence it will be pretty fast depending on your PGA memory size for session. Lets see how fast it does it for 30 Million rows orders table consisting of two columns you presented.

 alter session set nls_date_format='DD-MON-RRRR';
drop table orders;
drop table holiday;
create table orders (orderid number, orderdate date);
create table holiday(holiid number, holidate date);

insert into orders 
select level, '03-FEB-2017' from dual connect by level <= 20000000;


insert into holiday values (1,'03-dec-2017');
insert into holiday values (2,'06-feb-2017');
insert into holiday values (3,'09-jun-2017');
insert into holiday values (4,'03-feb-2017');
commit;
select count(*) from orders;
select count(*) from holiday;

alter table orders add constraint pk_orders primary key(orderid);
alter table holiday add constraint pk_holiday primary key (holiid);

create index idx_orders on orders(orderdate);
create index idx_holiid on holiday(holidate);

begin
for i in 20000001..30000000
loop
insert into orders select i,'04-MAR-2017' from dual;
end loop;
commit;
end;

SET TIMING ON;
SELECT orderId, orderDate, 
CASE WHEN hd.holiDate IS NOT NULL THEN 'HOLIDAY'
ELSE to_char("or".orderdate, 'DAY') END AS WEEKDAY
FROM Orders "or"
LEFT JOIN Holiday hd
ON "or".orderDate = hd.holiDate;

This last query elapsed time was >>Query Run In:Query Result 2

Elapsed: 00:00:00.363

Lets see explain Plan for it, enter image description here

BTW, don't use "or" as a table alias it is a reserved word and if you will use it you will have to use it with double quotes which is a not a good practice. Also don't use the begin and end block like I have for quick and dirty insert rather use forall if you don't want 2 Million context switches between PL/SQL ad SQL engines.

Adnan Bhatti
  • 3,410
  • 4
  • 25
  • 38