Resolving Missing Aggregation Data With Dimensional Sub Queries and Outer Join SQL
You can solve your problem by using a sub query that produces all the distinct values that you need to report on. I am assume that the output is supposed to provide a way to track the sales of each book on each day, even if no sales transactions have been entered into the table. Here is one possible solution and the output:
Problem Analysis:
The request to see a data value for which there is no physical representation in the queried table is possible by constructing a sub query
that represents all possible values whether or not they already exist in the table.
In this case, the two dimensions reported on are: "book" and "date". There must be an entry for total sales for each book for each day of transactions recorded in the book sales table.
Assumptions: It wasn't clear in the example if tblBook
represents already aggregated sales, or if there can be multiple records for a given "book" and "date" combination... as in the style of a transaction register that records each sale as they happen. The solution query aggregates sales amounts to provide a single record for each book on each day.
Note: This example was tested on a MySQL RDBMS. The SQL used in this example is ANSI standard, so it should work on a SQLServer database as well. You may have to find the equivalent to the ifNULL()
function used in this solution
The SQL Code
SELECT REPORTING_DIMENSIONS.book,
sum(ifNULL(sales_data.bookSales,0)) as totalSales,
REPORTING_DIMENSIONS.date
FROM
(SELECT book_list.book, date_list.date
FROM
(SELECT distinct book
FROM tblBook
) book_list
CROSS JOIN
(SELECT distinct date
FROM tblBook
) date_list
) REPORTING_DIMENSIONS
LEFT JOIN tblBook SALES_DATA
ON REPORTING_DIMENSIONS.book = SALES_DATA.book
AND REPORTING_DIMENSIONS.date = SALES_DATA.date
GROUP BY REPORTING_DIMENSIONS.book,
REPORTING_DIMENSIONS.date
The Output:
| BOOK | TOTALSALES | DATE |
|-------|------------|------------------------------|
| BookA | 2 | March, 04 2013 00:00:00+0000 |
| BookA | 6 | March, 05 2013 00:00:00+0000 |
| BookB | 4 | March, 04 2013 00:00:00+0000 |
| BookB | 0 | March, 05 2013 00:00:00+0000 |
Discussion of Results
The value of zero for BookB on 2013-03-15 actually was a NULL value. When an OUTER (i.e., LEFT) join is made for which there is no match on the joining values, a null is returned.
The only fancy step was the CROSS JOIN
designation. Also known as a CARTESIAN PRODUCT
, this generates a record for each possible "book" and "date" combination recorded in the existing table.
The sub query aliased as REPORTING DIMENSIONS
is a fixed, dynamic feature of the query. It allows the query to adjust its output as new books or dates are added to its scope.
Additional Considerations:
It is possible to have NO SALES for NONE of the products on one, all or many of the calendar days contained in the range of your report query. This again will affect the output of the original report query because there will be holes in the output of results.
This can be fixed by creating a fixed dimension table just for date values. Pre populate it with a record for each discrete calendar day for a few years before and after the scope of your sales data. A simple T/SQL block with a looping insert operation can easily populate this table. Once it's been populated, it is very unlikely it needs to be touched.