3

This question is related to this SO post

Rather than using a recursive CTE how do I add in missing data (deemed missing via date) using a DimDates table?

I have the following two tables:

create table the_table 
(
  [Date] datetime,
  Category2 varchar(10),
  Amount INT
)
insert into the_table
values
( '01 jan 2012', 'xx', 10),
( '03 jan 2012', 'yy', 50)


create table DimDate 
(
  [Date] datetime
)
insert into DimDate
values
( '01 jan 2012'),
( '02 jan 2012'),
( '03 jan 2012'),
( '04 jan 2012')

These are the results I'm trying to get to. I've not bothered with a recursive CTE as I, wrongly assumed, it'd be loads easier using our warehouse DimDate table:

enter image description here

ok - i might have stumbled on a possible solution - please poke holes in the following if it's wrong:

select

  coalesce(x.[Date], y.[Date]) AS Date ,
  coalesce(x.Category2, y.Category2) AS Category2 ,
  isnull(Amount,0) as Amount
from the_table x
full outer join 
(
select 
    d.Date
    , t.Category2
from 
        the_table t
        cross join DimDate d 
) y
    on
    x.Category2 = y.Category2
    and 
    x.Date = y.Date

this is what I've ended up with. A combination of the marked answer amd the cte from Aaron's post:

;WITH 
    Dates_cte ([Date]) AS
            (
            SELECT [Date] = DayMarker 
            FROM WHData.dbo.vw_DimDate x
            WHERE
                    x.DayMarker >= (SELECT MIN([Date]) FROM #Data1 WHERE Period = 'Daily') AND
                    x.DayMarker <= GETDATE()
            )   
    ,Categories ([Operator], [Market], [Product], [Measure]) AS 
                ( 
                SELECT DISTINCT 
                        [Operator]
                        , [Market]
                        , [Product]
                        , [Measure] 
                FROM #Data1 
                WHERE [Period] = 'Daily'
                ) 
INSERT INTO #Data1 
    SELECT 
         c.[Operator]
        , c.[Market]
        , c.[Product]
        , [Period] = CONVERT(VARCHAR(100), 'Daily')
        , d.[Date]  
        , c.[Measure]   
        , 0 
    FROM Dates_cte d CROSS JOIN Categories c
    WHERE NOT EXISTS 
            ( 
            SELECT * 
            FROM #Data1 AS T 
            WHERE 
                    t.[Period] = 'Daily' AND
                    t.[Operator] = c.[Operator] AND 
                    t.[Market] = c.[Market] AND 
                    t.[Product] = c.[Product] AND 
                    t.[Measure] = c.[Measure] AND 
                    t.[Date] = d.[Date] 
            ) 
Community
  • 1
  • 1
whytheq
  • 34,466
  • 65
  • 172
  • 267
  • does `CROSS JOIN` do the same thing? - the reson I added that was to get a cartesian product – whytheq Jun 20 '12 at 16:05
  • nah - `CROSS JOIN` isn't something I've used; so your comment was valid - I'll edit the OP – whytheq Jun 20 '12 at 16:11
  • In your question you want to `add in` the missing rows. From your edit I guess you just mean when you `SELECT` rather than inserting the missing rows? If so, this is a much simpler template `SELECT dates.date, categories.category, ISNULL(data.val, 0) FROM dates CROSS JOIN categories LEFT JOIN data ON data.date = dates.date AND data.category = categories.category` ***[Not everything needs a FULL OUTER JOIN.]*** – MatBailie Jun 20 '12 at 16:12
  • In your most recent edit `t CROSS JOIN d` will yield a HUGE table. Much more than just `all the combinations of date and category`. You need to think in the other direction to get a decent query: Make a template of all the combinations that you *should* have, then left join your data on to it. – MatBailie Jun 20 '12 at 16:16
  • @Dems - so does this also apply to the post I've marked as the answer; it includes a `CROSS JOIN` ? – whytheq Jun 20 '12 at 16:45
  • 1
    No. The accepted answer has `categories CROSS JOIN dates`. Yours has `all_data CROSS JOIN dates`. The cross join wasn't the problem, just *what* was being cross joined. – MatBailie Jun 20 '12 at 21:56

4 Answers4

3

Use INSERT INTO ... SELECT FROM DimDate CROSS JOIN categories WHERE NOT EXISTS ....

Try this:

INSERT INTO the_table
([Date], Category2, Amount)
SELECT [Date], category2, 0
FROM DimDate
CROSS JOIN
(
    SELECT DISTINCT category2 FROM the_table
) AS categories
WHERE NOT EXISTS
(
    SELECT *
    FROM thetable AS T
    WHERE T.category2 = categories.Category2
    AND T.[Date] = DimDate.[Date]
)

See it working online: ideone

If you're creating a data warehouse, I'd advise you to put the categories into a dimension table.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • +1 : Especially for suggesting the categories dimension table. – MatBailie Jun 20 '12 at 16:12
  • lovely script. I just use the WH - not creating it. – whytheq Jun 20 '12 at 16:24
  • Mark - in the comments of the OP Dems mentions that my possible solution using a CROSS JOIN is inefficient becasue of the explosion of data .... what is different about the CROSS JOIN you use ...is it equally inefficient? – whytheq Jun 20 '12 at 16:57
  • we've got loads of dimension tables - in this particular query I've already pulled in several Categories from different dimensions – whytheq Jun 20 '12 at 16:59
  • You may need to add a where clause to the `categories` derived table. You need to tell us: How are you going to identify only the categories in `the_table` that you're interested in? – Aaron Bertrand Jun 20 '12 at 17:06
  • @AaronBertrand; i've added my production query to the OP...I'm interested in all the categories. – whytheq Jun 21 '12 at 09:28
1

Obviously bad pseudocode that shows a possible solution

insert into table1
    select  from table2 
        where not exists (select from table1 where table1.date = table2.date)

That assumes you are tyring to add the data into table 1.

If you just want it in memory,

select * from table 1
union 
select * from table 2 where not exists (select from table1 where table1.date = table2.date)

or just an outer join

Jason Coyne
  • 6,509
  • 8
  • 40
  • 70
1
;WITH cat AS (SELECT Category2 FROM the_table GROUP BY Category2)
INSERT the_table([Date], Category2, Amount)
SELECT d.[Date], cat.Category2, 0
FROM DimDate AS d CROSS JOIN cat
LEFT OUTER JOIN the_table AS t
ON d.[Date] = t.[Date]
AND cat.Category2 = t.Category2
WHERE t.[Date] IS NULL;
Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • off subject: - just been checking [this article out](https://sqlblog.org/2008/10/30/my-stored-procedure-best-practices-checklist) wish I'd read it 3yrs back! – whytheq Jun 20 '12 at 16:15
0

Step 1, insert the missing dates:

select [Date], '', 0 from DimDate
where [Date] not in (select [Date] from the_table)

Step 2, update the Categoriy2 column:

update the_table
set Category2 =
     (select aux.Category from the_table aux where t.Date = 
        (select max(t.Date) from the_table t
         where t.Category2 <> '' and t.Date < aux.Date)
aF.
  • 64,980
  • 43
  • 135
  • 198