0

I have an activity table that looks like this:

activity_id         int
activity_client_id  int
activity_type_id    int
activity_start_date datetime
activity_end_date   datetime

One record for each instance of a client performing an activity. For example, today they may have an activity record that has an activity_start_date of '03-JUN-2015 09:00:00' and an activity_end_date of '03-JUN-2015 11:30:00'

So using this table, we can total up how many minutes activity an client has performed by doing this:

select client_id,
sum(DATEDIFF(MINUTE, activity_start_date , activity_end_date)) as total_minutes
from activity
group by client_id

Here's the tricky bit : I need to run a query that will show the date that each client reached a total of 300 minutes activity.

I am considering a creating a stored procedure that loops through each client, keeping a running count of the activity minutes. When it reaches 300 minutes, the procedure updates a column on another table, recording the date.

However, I'm wondering if I can achieve the results on the fly with a select query instead of having to run the stored procedure ?

Edit: As requested, I will give you some sample data to clarify what I am trying to achieve. Below we can see the records for client_id 112. The question is, on what date did the client complete 300 minutes (5 hours) of total activity ? For this client the answer would be 03/06/2015

+-------------+----------------------+---------------------+-------------------+
| activity_id | activity_client_id   | activity_start_date | activity_end_date |
+-------------+----------------------+---------------------+-------------------+
|      112434 |                  112 | 01/06/2015 09:00    | 01/06/2015 11:00  |
|      112490 |                  112 | 02/06/2015 12:00    | 02/06/2015 16:00  |
|      112688 |                  112 | 03/06/2015 09:00    | 03/06/2015 16:00  |
|      112998 |                  112 | 04/06/2015 09:00    | 04/06/2015 10:00  |
+-------------+----------------------+---------------------+-------------------+

So using a select statement, how could we have worked this out ? I can work out how to do this in a stored procedure, using a loop, but I would prefer to just have a select statement.

Brian
  • 101
  • 9
  • Can you provide sample data and desired results, to be sure that we understand the question correctly? – Gordon Linoff Jun 03 '15 at 15:48
  • I have added sample data to my question now. – Brian Jun 03 '15 at 22:06
  • What's your SQL Server version? – dnoeth Jun 03 '15 at 22:13
  • It looks like you are after simple sum of activity durations grouped by day, rather than a cumulative sum, but it is just a guess. If you add few more rows to your sample data to illustrate various edge cases, you would likely get correct answers. Another unclear moment is whether you want to treat each client independently, or you need to find days when all clients simultaneously had high activity. – Vladimir Baranov Jun 03 '15 at 23:49

3 Answers3

1

How about this:

select client_id,
sum(DATEDIFF(MINUTE, activity_start_date , activity_end_date)) as total_minutes
from activity
group by client_id
having sum(DATEDIFF(MINUTE, activity_start_date , activity_end_date)) > 300

That will give you all customers whose total activity is >300 minutes. Send that list to your stored procedure to do the updating.

Code Different
  • 90,614
  • 16
  • 144
  • 163
  • Yes - that is a good start to cut down the number of records I need to process. But the question is not just what clients have completed 300 or more minutes - it's on what date did each client complete 300 minutes activity ? – Brian Jun 03 '15 at 22:10
1

What you need is a cumulative sum. This is best done in SQL Server 2012+, but you can implement it (less efficiently) in earlier versions:

select a.*
from (select a.*, a2.cume_minutes,
             (a2.cume_minutes - datediff(minute, a2.activity_start_date, 
                                         a2.activity_end_date)
                                        ) as prev_cume_minutes
      from activity a cross apply
           (select sum(datediff(minute, a2.activity_start_date, a2.activity_end_date)) as cume_minutes
            from activity a2
            where c2.client_id = a.client_id and
                  a2.activity_start_date <= a.activity_start_date
           ) a2
      ) a
where prev_cume_minutes < 300 and cum3_minutes >= 300;

This assumes that the periods of activity are not overlapping, which seems to be the case base on your sample data.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks for this - but unfortunately in my actual data there will be overlapping. I think there's no getting away from me writing a stored procedure. – Brian Jun 03 '15 at 23:32
  • @Brian . . . Ask another question about overlapping data. Sample data and results is really helpful (and I also suggest using YYYY-MM-DD for the date format; this is an international forum). – Gordon Linoff Jun 03 '15 at 23:44
  • Thanks @Gordon, yes I see that I have over simplified my data and I've not been clear enough on what I am trying to achieve here. I'll post a new question, thanks for you suggestion of cumulative sum - it's definitely getting me in the right direction I think. – Brian Jun 04 '15 at 08:51
  • New question posted: http://stackoverflow.com/questions/30641259/sql-server-cumulative-sum-on-overlapping-data-getting-date-that-sum-reaches – Brian Jun 04 '15 at 10:24
0

Thanks for all your input, I've not explained this scenario clearly enough and I've over simplified my sample data. I have posted a new question that explains this much better I hope: SQL Server - cumulative sum on overlapping data - getting date that sum reaches a given value

Community
  • 1
  • 1
Brian
  • 101
  • 9