0

Consider a table representing a queue, with a datetime timestamp for when an item has been added, and one for when it is completed.

The question: How can I efficiently query for any given time, how many items were in queue, ie. added but not completed.

A table sample example:

id value added                     completed
6  1     2016-01-01 00:00:12.345   2016-01-01 00:01:12.345
7  500   2016-01-01 01:12:12.345   2016-01-01 01:15:12.345
8  1     2016-01-01 01:12:12.345   2016-01-01 02:16:12.345
9  2     2016-01-01 01:33:12.345   NULL
10 2     2016-01-01 01:33:12.345   NULL
11 2     2016-01-01 01:33:12.345   NULL

Items can be added at any time, but it takes time for them to be completed.

In the example above, 9, 10 and 11 are under process, so I can easily query to find that 3 items are in queue right now. But how do I query to find for example how many items were in queue and not completed at any given past time?

I am looking for a result that looks something like this:

date        time          count       sum value
2016-01-01  00:00:00.000  1           1
2016-01-01  00:12:00.000  2           501
2016-01-01  00:13:00.000  2           501
2016-01-01  00:14:00.000  2           501
2016-01-01  00:15:00.000  1           1
2016-01-01  00:33:00.000  3           6

My goal is to find the times with max number of items in queue. From here I would be able to say that the queued item size was highest at 00:33 and that the queued value size was the highest 00:12-00:14.

What I've tried: I have experimented with WITH like suggested in this answer. It works fine for only one date, but when I use both hs.added>= DATEADD(... and hs.completed >= DATEADD( criteria, the execution that was 0 seconds now seen to never complete. I am not completely grasping the execution process here.

This large table is in production, and I do not want to keep a query executing for too long.

Edit: stats:

COLUMN_NAME DATA_TYPE CHARACTER_MAXIMUM_LENGTH IS_NULLABLE
ID          int       NULL                     NO
added       datetime  NULL                     NO
completed   datetime  NULL                     YES
value       int       NULL                     NO

CONSTRAINT_NAME
PK_Queue

name                type_desc     is_unique is_primary_key
PK_Queue            CLUSTERED     1         1
IX_Queue_completed  NONCLUSTERED  0         0
IX_Queue_added      NONCLUSTERED  0         0

rows       data
6 000 000  15 000 000 KB
Community
  • 1
  • 1
JOG
  • 5,590
  • 7
  • 34
  • 54
  • You can always find the exact state of the queue by just locking it? Get the details you want and release it. Show the results. You are done? What is the issue you have with these accurate results? – Ryan Vincent Jun 30 '16 at 00:40
  • I want to get an overview of how the queue has behaved over time. – JOG Jun 30 '16 at 00:41
  • Yes, statistics and samples may be stored, but they are not. I am analyzing past data and looking for peaks that already happened. I don't have any admin roles on this production db. – JOG Jun 30 '16 at 00:58
  • 2
    I need the table definition (I am not going to guess what the indexes are and the column definitions). You know what information you need to collect how often and when it needs to be reported. I mention these things so that, you know what information you need, to provide a useful answer to the client. Once you have all the information and what you need to provide. Your question will be a lot more precise (focused). imo, There is a really good question here but you haven't defined it to yourself accurately yet. – Ryan Vincent Jun 30 '16 at 01:18
  • You are right. I have no particular client question here but my own idea of getting an overview of when we had long queues the last month. I set out to try to study a relation between `value size` in queue to long queue _times_ (the time that a particular item spends waiting) that I am already monitoring (instead of looking for `item size`). I got stuck and realized there is something to learn here. – JOG Jun 30 '16 at 01:35
  • @RyanVincent are you saying that the answer below by Gordon should be efficient enough. Then I will just let the query run longer under controlled circumstances. – JOG Jun 30 '16 at 01:55
  • The run the answer you already have. It will be useful to you. Seriously, I respect the author a lot. – Ryan Vincent Jun 30 '16 at 02:01
  • 1
    Thanks for listening and replying. I found it interesting and useful. In positive ways. Thanks for your time. – Ryan Vincent Jun 30 '16 at 02:03
  • 1
    me too, thank you! – JOG Jun 30 '16 at 02:14

2 Answers2

1

The basic query looks like this for a given time:

select count(*), sum(q.value)
from queue q
where @datetime >= q.added and
      (@datetime < q.completed or q.completed is null);

For all times, you can just put them together in a subquery and join them in:

select dt.dt, count(q.id), sum(q.value)
from (select q.added as dt from queue q union select q.completed from queue q
     ) dt left join
     queue q
     on dt.dt >= q.added and (dt.dt < q.completed or q.completed is null)
group by dt.dt
order by dt.dt;

To get the maximum value, add top 1 and order by count(q.id) desc.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks, but still, the execution does not complete in a timely manner. This table is huge. I am updating the question. – JOG Jun 30 '16 at 00:49
  • I updated the question with stats and date precision. Are there be any performance issues here that could be considered in the query in concern to the large number of rows? Otherwise I will have a result table with as many rows as the source table. – JOG Jun 30 '16 at 02:23
0

For your consideration:

I use a UDF to generate dynamic date ranges (listed below).

Just a quick note, on ID 8, I'm assuming you had a typo on the complete date (1:16 vs 2:16).

Declare @Table table (id int, value int,Added datetime,complete datetime)
Insert into @Table values
(6,  1,  '2016-01-01 00:00:12.345','2016-01-01 00:01:12.345'),
(7,  500,'2016-01-01 01:12:12.345','2016-01-01 01:15:12.345'),
(8,  1  ,'2016-01-01 01:12:12.345','2016-01-01 01:16:12.345'),
(9,  2  ,'2016-01-01 01:33:12.345',NULL),
(10, 2  ,'2016-01-01 01:33:12.345',NULL),
(11, 2  ,'2016-01-01 01:33:12.345',NULL)


Declare @DateR1 DateTime = '2016-01-01 00:00'
Declare @DateR2 DateTime = '2016-01-01 01:35'
Declare @DatePart varchar(25) = 'MI'
Declare @DateIncr int = 1


Select KeyDate
      ,Count = sum(isnull(Sign(B.Value),0))
      ,Value = isnull(sum(Value),0)
   From (Select KeyDate = RetVal From  [dbo].[udf-Create-Range-Date](@DateR1,@DateR2,@DatePart,@DateIncr)) A
   Left Join @Table  B
   on KeyDate between added and IsNull(complete,@DateR2)
  Group By KeyDate
  Having sum(value)>0  -- Optional for zero supression
  Order By KeyDate

Returns

KeyDate                 Count  Value
2016-01-01 00:01:00.000 1      1
2016-01-01 01:13:00.000 2      501
2016-01-01 01:14:00.000 2      501
2016-01-01 01:15:00.000 2      501
2016-01-01 01:16:00.000 1      1
2016-01-01 01:34:00.000 3      6
2016-01-01 01:35:00.000 3      6

The UDF - there are many options out there or you could even use a Date or Tally Table.

CREATE FUNCTION [dbo].[udf-Create-Range-Date] (@DateFrom datetime,@DateTo datetime,@DatePart varchar(10),@Incr int)

Returns 
@ReturnVal Table (RetVal datetime)

As
Begin
    With DateTable As (
        Select DateFrom = @DateFrom
        Union All
        Select Case @DatePart
               When 'YY' then DateAdd(YY, @Incr, df.dateFrom)
               When 'QQ' then DateAdd(QQ, @Incr, df.dateFrom)
               When 'MM' then DateAdd(MM, @Incr, df.dateFrom)
               When 'WK' then DateAdd(WK, @Incr, df.dateFrom)
               When 'DD' then DateAdd(DD, @Incr, df.dateFrom)
               When 'HH' then DateAdd(HH, @Incr, df.dateFrom)
               When 'MI' then DateAdd(MI, @Incr, df.dateFrom)
               When 'SS' then DateAdd(SS, @Incr, df.dateFrom)
               End
        From DateTable DF
        Where DF.DateFrom < @DateTo
    )

    Insert into @ReturnVal(RetVal) Select DateFrom From DateTable option (maxrecursion 32767)

    Return
End

-- Syntax Select * from [dbo].[udf-Create-Range-Date]('2016-10-01','2020-10-01','YY',1) 
-- Syntax Select * from [dbo].[udf-Create-Range-Date]('2016-10-01','2020-10-01','DD',1) 
-- Syntax Select * from [dbo].[udf-Create-Range-Date]('2016-10-01','2016-10-31','MI',15) 
-- Syntax Select * from [dbo].[udf-Create-Range-Date]('2016-10-01','2016-10-02','SS',1) 
John Cappelletti
  • 79,615
  • 7
  • 44
  • 66