8

I need to do something really weird, which is to create fake records in a view to fill the gap between posted dates of product prices.

Actually, my scenario is a little bit more complicated than that, but I've simplified to products/dates/prices.

Let's say we have this table:

create table PRICES_TEST
(
   PRICE_DATE    date          not null,
   PRODUCT       varchar2(13) not null,
   PRICE         number
);

alter table PRICES_TEST 
  add constraint PRICES_TEST_PK
    primary key (PRICE_DATE, PRODUCT);

With these records:

insert into PRICES_TEST values (date'2012-04-15', 'Screw Driver', 13);
insert into PRICES_TEST values (date'2012-04-18', 'Screw Driver', 15);

insert into PRICES_TEST values (date'2012-04-13', 'Hammer', 10);
insert into PRICES_TEST values (date'2012-04-16', 'Hammer', 15);
insert into PRICES_TEST values (date'2012-04-19', 'Hammer', 17);

selecting records will return me this:

PRICE_DATE                PRODUCT       PRICE                  
------------------------- ------------- ---------------------- 
13-Apr-2012 00:00:00      Hammer        10                     
16-Apr-2012 00:00:00      Hammer        15                     
19-Apr-2012 00:00:00      Hammer        17                     
15-Apr-2012 00:00:00      Screw Driver  13                     
18-Apr-2012 00:00:00      Screw Driver  15                     

Assuming today is Apr 21 2012, I need a view that shall repeat each price every day until a new price is posted. Like this:

PRICE_DATE                PRODUCT       PRICE                  
------------------------- ------------- ---------------------- 
13-Apr-2012 00:00:00      Hammer        10                     
14-Apr-2012 00:00:00      Hammer        10                     
15-Apr-2012 00:00:00      Hammer        10                     
16-Apr-2012 00:00:00      Hammer        15                     
17-Apr-2012 00:00:00      Hammer        15                     
18-Apr-2012 00:00:00      Hammer        15                     
19-Apr-2012 00:00:00      Hammer        17                     
20-Apr-2012 00:00:00      Hammer        17                     
21-Apr-2012 00:00:00      Hammer        17                     
15-Apr-2012 00:00:00      Screw Driver  13                     
16-Apr-2012 00:00:00      Screw Driver  13                     
17-Apr-2012 00:00:00      Screw Driver  13                     
18-Apr-2012 00:00:00      Screw Driver  15                     
19-Apr-2012 00:00:00      Screw Driver  15                     
20-Apr-2012 00:00:00      Screw Driver  15                     
21-Apr-2012 00:00:00      Screw Driver  15                     

Any ideas how to do that? I cannot really use other auxiliary tables, triggers nor PL/SQL programming, I really need to do this using a view.

I think this can be done using oracle analytics, but I'm not familiar with that. I tried to read this http://www.club-oracle.com/articles/analytic-functions-i-introduction-164/ but I didn't get it at all.

Bonifacio2
  • 3,405
  • 6
  • 34
  • 54
L. Holanda
  • 4,432
  • 1
  • 36
  • 44
  • NVM, i understand it now :) It will be possible to generate data with creative use of the `Dual` table. – mellamokb Apr 20 '12 at 23:46
  • I could assume that oracle analytics has it's own date dimension table to perform such function. Could you just create your own date dimension table? – Chris Hayes Apr 20 '12 at 23:53
  • here is an interesting article on creating a dynamic calandar view in tsql (yes, you want oracle but maybe it can be altered): http://sqlserverpedia.com/blog/sql-server-bloggers/tsql-tuesday-18-using-a-recursive-cte-to-create-a-calendar-table/ – Chris Hayes Apr 20 '12 at 23:58

4 Answers4

6

You can create a row generator statement using the CONNECT BY LEVEL syntax, cross joined with the distinct products in your table, and then outer join that to your prices table. The final touch is to use the LAST_VALUE function and IGNORE NULLS to repeat the price until a new value is encountered, and since you wanted a view, with a CREATE VIEW statement:

create view dense_prices_test as
select
    dp.price_date
  , dp.product
  , last_value(pt.price ignore nulls) over (order by dp.product, dp.price_date) price
from (
      -- Cross join with the distinct product set in prices_test
      select d.price_date, p.product
      from (
            -- Row generator to list all dates from first date in prices_test to today
            with dates as (select min(price_date) beg_date, sysdate end_date from prices_test)
            select dates.beg_date + level - 1 price_date 
            from dual
            cross join dates
            connect by level <= dates.end_date - dates.beg_date + 1
            ) d
      cross join (select distinct product from prices_test) p
     ) dp
left outer join prices_test pt on pt.price_date = dp.price_date and pt.product = dp.product;
WoMo
  • 7,136
  • 2
  • 29
  • 36
  • Man, you're a master! I never heard of 'with', 'connect by' nor 'ignore nulls' in an analytic function. Thank you very much. It worked perfectly with my more complicated scenario. – L. Holanda Apr 21 '12 at 03:47
  • Hi @Wolf, although your answer was the one that made me learn more than the others, mellamokb's answer is the one that actually produces the desired results, thus, he deserves the "accepted answer" flag. Thank you very much all you guys who helped me with this. – L. Holanda Apr 23 '12 at 18:18
5

I think I have a solution using an incremental approach toward the final result with CTE's:

with mindate as
(
  select min(price_date) as mindate from PRICES_TEST
)
,dates as
(
  select mindate.mindate + row_number() over (order by 1) - 1 as thedate from mindate,
    dual d connect by level <= floor(SYSDATE - mindate.mindate) + 1
)
,productdates as
(
  select p.product, d.thedate
  from (select distinct product from PRICES_TEST) p, dates d
)
,ranges as
(
  select
    pd.product,
    pd.thedate,
    (select max(PRICE_DATE) from PRICES_TEST p2
     where p2.product = pd.product and p2.PRICE_DATE <= pd.thedate) as mindate
    from productdates pd
)
select 
    r.thedate,
    r.product,
    p.price
from ranges r
inner join PRICES_TEST p on r.mindate = p.price_date and r.product = p.product
order by r.product, r.thedate
  • mindate retrieves the earliest possible date in the data set
  • dates generates a calendar of dates from earliest possible date to today.
  • productdates cross joins all possible products with all possible dates
  • ranges determines which price date applied at each date
  • the final query links which price date applied to the actual price and filters out dates for which there are no relevant price dates via the inner join condition

Demo: http://www.sqlfiddle.com/#!4/e528f/126

mellamokb
  • 56,094
  • 12
  • 110
  • 136
  • I just removed the reference to the dual table and the floor rounding in the second 'with' select. They are not really needed. Other than that, it looks great! My real scenario contains 2 more columns in the PK (something like product, brand, model) and several other info columns (something like price, net weight, notes) and it still works perfectly with the appropriate changes. Thank you very much. – L. Holanda Apr 23 '12 at 20:29
4

I made a few changes to Wolf's excellent answer.

I replaced the subquery factoring (WITH) with a regular subquery in the connect by. This makes the code a little simpler. (Although this type of code looks weird at first either way, so there may not be a huge gain here.)

Most significantly, I used a partition outer join instead of a cross join and outer join. Partition outer joins are also kind of strange, but they are meant for exactly this type of situation. This makes the code simpler, and should improve performance.

select
    price_dates.price_date
    ,product
    ,last_value(price ignore nulls) over (order by product, price_dates.price_date) price
from
(
    select trunc(sysdate) - level + 1 price_date
    from dual
    connect by level <= trunc(sysdate) -
        (select min(trunc(price_date)) from prices_test) + 1
) price_dates
left outer join prices_test
    partition by (prices_test.product)
    on price_dates.price_date = prices_test.price_date;
Jon Heller
  • 34,999
  • 6
  • 74
  • 132
  • That's a nice improvement. The WITH clause was an artifact from a bit of other SQL I had, but I keep forgetting about partition outer joins. Nice work. – WoMo Apr 21 '12 at 16:28
1

I just realized that @Wolf and @jonearles improvements do not return the exact results I needed because the row generator to list all dates won't generate the ranges by product. If the first price of product A is later than any price of product B the first listed date of product A still must be the same. But they really helped me to work further and get the expected results:

I started with changing @wolf's date range selector from this:

select min(price_date) beg_date, sysdate end_date from prices_test

to this:

select min(PRICE_DATE) START_DATE, sysdate as END_DATE, PRODUCT 
from PRICES_TEST group by sysdate, PRODUCT

But, somehow, the number of rows per product is exponentially growing repeatedly for each level. I just added a distinct in the outter query. The finally select was this:

select
  DP.PRICE_DATE,
  DP.PRODUCT,
  LAST_VALUE(PT.PRICE ignore nulls) over (order by DP.PRODUCT, DP.PRICE_DATE) PRICE
from (
  select distinct START_DATE + DAYS as PRICE_DATE, PRODUCT 
  from 
  (
    -- Row generator to list all dates from first date of each product to today
    with DATES as (select min(PRICE_DATE) START_DATE, sysdate as END_DATE, PRODUCT from PRICES_TEST group by sysdate, PRODUCT)
    select START_DATE, level - 1 as DAYS, PRODUCT
    from DATES
    connect by level < END_DATE - START_DATE + 1
    order by 3, 2
  ) d order by 2, 1
) DP
left outer join prices_test pt on pt.price_date = dp.price_date and pt.product = dp.product;

@Mellamokb solution is actually what I really need and is certainly better than my noobie solution.

Thank's everyone not only for helping me with this but also for presenting me features such as "with" and "connect by".

L. Holanda
  • 4,432
  • 1
  • 36
  • 44