1

I have a table that stores the amount of errors according to what alarm-id it is. The table looks something like this:

|----DATE----|---ALARM_ID---|---COUNTER---|
| 2012-01-01 |      1       |      32     |
| 2012-01-01 |      2       |      28     |
| 2012-01-02 |      1       |      12     |
| 2012-01-02 |      2       |      23     |
| 2012-01-03 |      1       |      3      |
| 2012-01-03 |      2       |      9      |
| 2012-01-05 |      1       |      8      |
| 2012-01-05 |      2       |      1      |
| 2012-01-07 |      1       |      102    |
| 2012-01-07 |      2       |      78     |

Notice the gap between date (2012-01-03 - 2012-01-05) and (2012-01-05 - 2012-01-07). On these dates there isn't any data because the system, that my program is monitoring, haven't reported any errors at that date. What I'm looking for is a SQL SELECT query that returns the total amount of errors on each date, for example:

|----DATE----|---COUNTER---|
| 2012-01-01 |      60     |
| 2012-01-02 |      35     |
| 2012-01-03 |      12     |
| 2012-01-04 |      0      |
| 2012-01-05 |      9      |
| 2012-01-06 |      0      |
| 2012-01-07 |      180    |

I have a query that returns ID's even if they doesn't exist in the table, and if the ID doesn't exist, return the ID anyway with the COUNTER value 0. As such:

        BEFORE                                     AFTER

|---ID---|---COUNTER---|                  |---ID---|---COUNTER---|
|   1    |      2      |                  |   1    |      2      |
|   2    |      6      |                  |   2    |      6      |
|   3    |      1      |       -->        |   3    |      1      |
|   5    |      9      |                  |   4    |      0      |
|   6    |      10     |                  |   5    |      9      |
                                          |   6    |      10     |
                                          |   7    |      0      |
                                          |   8    |      0      |

The query goes like this:

select t.num as ID, coalesce(yt.COUNTER, 0)
from all_stats yt right join 
( select t1.num + t2.num * 10 + t3.num * 100 + t4.num * 1000 as num 
from ( select 1 as num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0 ) t1 cross join 
( select 1 as num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0 ) t2 cross join 
( select 1 as num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0 ) t3 cross join 
( select 1 as num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0 ) t4 ) 
t on yt.ID = t.num 
where (t.num between (select min(ID) from all_stats) and (select max(ID) from all_stats)) order by ID

I can't figure out how I can change this query when it's regarding dates. Can someone please help me on this issue?

I'm using MySQL

Thanks in advance, Steve-O

ClydeFrog
  • 912
  • 1
  • 14
  • 38

3 Answers3

4

The exact details will depend on the DBMS, and on the nature of the database (e.g., OLAP-oriented vs. OLTP-oriented), but one common general approach is to create an auxiliary calendar table that represents dates as a dimension. Then you can use regular JOINs, rather than having to use complex logic to generate missing dates.

The answers to this StackOverflow question describe how to apply this approach on MySQL.

You can use a similar approach for numbers, by the way, by having a numbers tables; I've never done that myself for numbers, but it seems to be a popular idea; see this dba.stackexchange.com question.

Community
  • 1
  • 1
ruakh
  • 175,680
  • 26
  • 273
  • 307
1

If you're using SQL Server 2005 or above you can use a CTE (if not, a loop or other sql technique to populate a table with the dates in the range). Note also there is a limit to the levels of recursion within a CTE.

declare @dateRange table
(
  dateBegin datetime,
  dateEnd datetime
)

insert into @dateRange (dateBegin, dateEnd) 
values ('2012-01-01', '2012-01-07')

;with cte (d)
as (select dateBegin as d
    from @dateRange tbl
    where datediff(day, tbl.dateBegin, tbl.dateEnd) <= 100
    union all
    select dateadd(day, 1, cte.d) as d
    from cte
      inner join @dateRange tbl on cte.d < tbl.dateEnd)

Then get the full results either using the CTE or a temporary table that contains the set of dates in the range:

select cte.d, sum(isnull(e.errorCounter, 0))
from cte
  left outer join @errors e on e.errorDate = cte.d
group by cte.d
order by cte.d
kaj
  • 5,133
  • 2
  • 21
  • 18
0

You really should handle this at the application layer (ie iterate over the known date range and pull the non-zero vals from the resultset) or fix your table to always include the dates needed if you MUST have a database-centered solution. There's no really good way to generate, on the fly, a set of dates to use in building a continuous date range query.

You can see this for some examples of DB scripting solutions:

Return temp table of continuous dates

But I think you're posing the wrong question. Fix the database to include what you need, or fix how you're generating your report. Databases aren't meant to do interpolation and data generation.

Community
  • 1
  • 1
Irongaze.com
  • 1,639
  • 16
  • 22
  • There is really a very good way to produce, on the fly, a set of dates: a pre-filled Calendar table. – ypercubeᵀᴹ Feb 10 '12 at 13:18
  • Sure, but that's not "on the fly" then. :-) That's the approach recommended in the page I linked, and it will work fine, but still... code smell around the whole approach to the problem in my opinion. – Irongaze.com Feb 10 '12 at 13:21
  • It's not generation on the fly, it's producing on the fly. And "code smell"? Really? Why? If you need these range queries often, there is no reason not to have the calculations done before. Having a calendar table for 2-3 centuries will be using about 1-20 MB (depending on the details) which is peanuts for a database. – ypercubeᵀᴹ Feb 10 '12 at 13:26
  • And even without a base table, generating such a date range table "on the fly" (whatever that be) for small ranges can be done quite efficiently. – ypercubeᵀᴹ Feb 10 '12 at 13:32
  • 1
    We are vigorously agreeing with each other. :-) The initial question asked how to tweak a SQL query, and my reply is you shouldn't, but could do other approaches including pre-generated date tables as in the linked answer. If the original poster prefers everything kept in db, then that is a great answer. In my personal experience from solving similar problems however, this type of issue is often easier solved at the application layer, without the futzy outer joins and whatnot, as it's easy to detect a missing value and substitute a zero while iterating through a date range. Cheers! – Irongaze.com Feb 10 '12 at 13:34
  • 1
    I have to agree that for this simple query, a solution at the application level would work just fine, too. For more complex problems (who do appear quite often), a Calendar table is ideal. – ypercubeᵀᴹ Feb 10 '12 at 13:48