2

I'm asking because I'm not sure what to google for - attempts that seemed obvious to me returned nothing useful.

I have sales coming into the database of objects at particular datetimes with particular $ values. I want to get all groups of sales records a) within a (any, not just say "on the hour" like 1am-4am) 3 hour time frame, that b) total >= $1000.

The table looks like:

Sales
SaleId int primary key
Item varchar
SaleAmount money
SaleDate datetime

Even just a suggestion on what I should be googling for would be appreciated lol!

EDIT:

Ok after trying the cross apply solution - it's close but not quite there. To illustrate, consider the following sample data:

-- table & data script

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Sales](
    [pkid] [int] IDENTITY(1,1) NOT NULL,
    [item] [int] NULL,
    [amount] [money] NULL,
    [saledate] [datetime] NULL,
 CONSTRAINT [PK_Sales] PRIMARY KEY CLUSTERED 
(
    [pkid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

INSERT [dbo].[Sales] VALUES (1, 649.3800, CAST(N'2017-12-31T21:46:19.000' AS DateTime))
INSERT [dbo].[Sales] VALUES (1, 830.6700, CAST(N'2018-01-01T08:38:58.000' AS DateTime))
INSERT [dbo].[Sales] VALUES (1, 321.0400, CAST(N'2018-01-01T09:08:04.000' AS DateTime))
INSERT [dbo].[Sales] VALUES (3, 762.0300, CAST(N'2018-01-01T07:26:30.000' AS DateTime))
INSERT [dbo].[Sales] VALUES (2, 733.5100, CAST(N'2017-12-31T12:04:07.000' AS DateTime))
INSERT [dbo].[Sales] VALUES (3, 854.5700, CAST(N'2018-01-01T08:32:11.000' AS DateTime))
INSERT [dbo].[Sales] VALUES (2, 644.1700, CAST(N'2017-12-31T17:49:59.000' AS DateTime))
INSERT [dbo].[Sales] VALUES (1, 304.7700, CAST(N'2018-01-01T08:01:50.000' AS DateTime))
INSERT [dbo].[Sales] VALUES (2, 415.1200, CAST(N'2017-12-31T20:27:28.000' AS DateTime))
INSERT [dbo].[Sales] VALUES (3, 698.1700, CAST(N'2018-01-01T02:39:28.000' AS DateTime))

A simple adaptation of the cross apply solution from the comments, to go item by item:

select s.*
    , s2.saleamount_sum
from Sales s cross apply
     (select sum(s_in.amount) as saleamount_sum
      from Sales s_in
      where s.item = s_in.item
        and s.saledate >= s_in.saledate and s_in.saledate < dateadd(hour, 3, s.saledate)
     ) s2
where s2.saleamount_sum > 1000
order by s.item, s.saledate

So the actual data (sorted by item/time) looks like:

pkid        item        amount          saledate
1          1            649.38          2017-12-31 21:46:19.000
8          1            304.77          2018-01-01 08:01:50.000
2          1            830.67          2018-01-01 08:38:58.000
3          1            321.04          2018-01-01 09:08:04.000
5          2            733.51          2017-12-31 12:04:07.000
7          2            644.17          2017-12-31 17:49:59.000
9          2            415.12          2017-12-31 20:27:28.000
10       3          698.17          2018-01-01 02:39:28.000
4          3            762.03          2018-01-01 07:26:30.000
6          3            854.57          2018-01-01 08:32:11.000

and the result of the cross apply method:

pkid    item    amount  saledate    saleamount_sum
2   1   830.67  1/1/18 8:38 AM  1784.82
3   1   321.04  1/1/18 9:08 AM  2105.86
7   2   644.17  12/31/17 5:49 PM    1377.68
9   2   415.12  12/31/17 8:27 PM    1792.8
4   3   762.03  1/1/18 7:26 AM  1460.2
6   3   854.57  1/1/18 8:32 AM  2314.77

The issue can be seen by considering the method's analysis of Item 1. From the data, we see that FIRST sale of item 1 does not participate in a 3-hour-over-$1000. The second, third, and fourth Item 1 sales however do so participate. And they are correctly picked out, pkid = 2 and 3. But their sums aren't right - both of their sums include the very FIRST sale of Item 1, which does not participate in the timespan/amount condition. I would have expected the saleamount_sum for pkid 2 to be 1135.44, and for pkid 3 to be 1456.48 (their reported sums minus the first non-participating sale).

Hopefully that makes sense. I'll try fiddling with the cross apply query to get it. Anyone who can quickly see how to get what I'm after, please feel free to chime in.

thanks, -sff

sherifffruitfly
  • 415
  • 5
  • 14
  • Look up WHERE and HAVING. That should get you going. – dfundako Feb 13 '18 at 16:35
  • Sample data and desired results would really help. – Gordon Linoff Feb 13 '18 at 16:36
  • Read [this SO post on group by time interval](https://stackoverflow.com/questions/41908362/in-sql-server-2014-how-do-you-group-by-any-time-based-interval/41918973#41918973) – Zohar Peled Feb 13 '18 at 16:37
  • You want to get sales from all days in a specified interval of hours, with amount >= 1000$? Or only for today in the specified interval? – vpalade Feb 13 '18 at 16:43
  • @vpalade - if i'm understanding you correctly, i want the condition met within the specified interval, regardless of whether or not that interval crosses a day boundary. if simpler, it's helpful to me to have it for only sales in the same day. – sherifffruitfly Feb 13 '18 at 18:15

4 Answers4

1

Here is one method using apply:

select t.*, tt.saleamount_sum
from t cross apply
     (select sum(t2.saleamount) as saleamount_sum
      from t t2
      where t2.saledate >= t.saledate and t2.saledate < dateadd(hour, 3, t.saledate)
     ) tt
where tt.saleamount_sum > 1000;

Edit:

If you want this per item (which is not specified in the question), then you need a condition to that effect:

select t.*, tt.saleamount_sum
from t cross apply
     (select sum(t2.saleamount) as saleamount_sum
      from t t2
      where t2.item = t.item and t2.saledate >= t.saledate and t2.saledate < dateadd(hour, 3, t.saledate)
     ) tt
where tt.saleamount_sum > 1000;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks! I'm not certain this is giving me exactly the right answers, but the concept is deffo what i'm looking for. I'll whip up a sterile example dataset later & explore it in more detail. – sherifffruitfly Feb 13 '18 at 18:29
  • Ok your suggestion is close - the issue appears to be that it doesn't "forget" rows that shouldn't count. I'll illustrate. oh #@$@ can't put enough chars in comments for example table/data code - I'll edit the question itself & hope you see it. – sherifffruitfly Feb 14 '18 at 01:39
1

Your query had one wrong comparison (s.saledate >= s_in.saledate) instead of s_in.saledate >= s.saledate. The inner query below looks for the next 3 hours for each row of the outer query.

Sample data

DECLARE @Sales TABLE (
    [pkid] [int] IDENTITY(1,1) NOT NULL,
    [item] [int] NULL,
    [amount] [money] NULL,
    [saledate] [datetime] NULL
);

INSERT INTO @Sales VALUES (1, 649.3800, CAST(N'2017-12-31T21:46:19.000' AS DateTime))
INSERT INTO @Sales VALUES (1, 830.6700, CAST(N'2018-01-01T08:38:58.000' AS DateTime))
INSERT INTO @Sales VALUES (1, 321.0400, CAST(N'2018-01-01T09:08:04.000' AS DateTime))
INSERT INTO @Sales VALUES (3, 762.0300, CAST(N'2018-01-01T07:26:30.000' AS DateTime))
INSERT INTO @Sales VALUES (2, 733.5100, CAST(N'2017-12-31T12:04:07.000' AS DateTime))
INSERT INTO @Sales VALUES (3, 854.5700, CAST(N'2018-01-01T08:32:11.000' AS DateTime))
INSERT INTO @Sales VALUES (2, 644.1700, CAST(N'2017-12-31T17:49:59.000' AS DateTime))
INSERT INTO @Sales VALUES (1, 304.7700, CAST(N'2018-01-01T08:01:50.000' AS DateTime))
INSERT INTO @Sales VALUES (2, 415.1200, CAST(N'2017-12-31T20:27:28.000' AS DateTime))
INSERT INTO @Sales VALUES (3, 698.1700, CAST(N'2018-01-01T02:39:28.000' AS DateTime))

INSERT INTO @Sales VALUES (4, 600, CAST(N'2018-01-01T02:39:01.000' AS DateTime))
INSERT INTO @Sales VALUES (4, 600, CAST(N'2018-01-01T02:39:02.000' AS DateTime))
INSERT INTO @Sales VALUES (4, 600, CAST(N'2018-01-01T02:39:03.000' AS DateTime))
INSERT INTO @Sales VALUES (4, 600, CAST(N'2018-01-01T02:39:04.000' AS DateTime))
INSERT INTO @Sales VALUES (4, 600, CAST(N'2018-01-01T02:39:05.000' AS DateTime))
INSERT INTO @Sales VALUES (4, 600, CAST(N'2018-01-01T02:39:06.000' AS DateTime))

Query

select
    s.*
    , s2.saleamount_sum
from
    @Sales AS s
    cross apply
        (
            select sum(s_in.amount) as saleamount_sum
            from @Sales AS s_in
            where 
                s.item = s_in.item
                and s_in.saledate >= s.saledate
                and s_in.saledate < dateadd(hour, 3, s.saledate)
        ) AS s2
where s2.saleamount_sum > 1000
order by s.item, s.saledate
;

Result

+------+------+--------+-------------------------+----------------+
| pkid | item | amount |        saledate         | saleamount_sum |
+------+------+--------+-------------------------+----------------+
|    8 |    1 | 304.77 | 2018-01-01 08:01:50.000 | 1456.48        |
|    2 |    1 | 830.67 | 2018-01-01 08:38:58.000 | 1151.71        |
|    7 |    2 | 644.17 | 2017-12-31 17:49:59.000 | 1059.29        |
|    4 |    3 | 762.03 | 2018-01-01 07:26:30.000 | 1616.60        |
|   11 |    4 | 600.00 | 2018-01-01 02:39:01.000 | 3600.00        |
|   12 |    4 | 600.00 | 2018-01-01 02:39:02.000 | 3000.00        |
|   13 |    4 | 600.00 | 2018-01-01 02:39:03.000 | 2400.00        |
|   14 |    4 | 600.00 | 2018-01-01 02:39:04.000 | 1800.00        |
|   15 |    4 | 600.00 | 2018-01-01 02:39:05.000 | 1200.00        |
+------+------+--------+-------------------------+----------------+

I added 6 rows with item=4 to the sample data. They are all within 3 hours and there are 5 subsets of these 6 rows that have a sum larger than 1000. Technically this result is correct, but do you really want this kind of result?

Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90
  • thanks for setting me straight re comparison! the issue you raise also there in the original Item 1's. The query takes all of the "tails" that meet the condition as well as the whole. I guess I would like only the "maximal" result to be returned for each item-timespan found. But then I have to think about a) your item 4 having ANOTHER hot-selling time window, and b) what if a hot-selling time period lasts longer than 3 hours (say item 4 sells 900 units every hour for 6 hours?). This is enough to get me going tho, so thanks! – sherifffruitfly Feb 14 '18 at 14:19
  • @sherifffruitfly, "say item 4 sells 900 units every hour for 6 hours" - exactly. You need to decide what kind of result you want to see in such cases. The first thing that comes to mind is: the hot-selling time window has to be **at least** 3 hours, but if it lasts longer, it should continue. Look through each row in chronological order. If the sum of amounts for **previous** 3 hours starting from the current row is more than 1000, then the current row belongs to the hot period. Once all rows have been flagged as hot/non-hot, use gaps-and-islands to group consecutive hot rows into hot periods. – Vladimir Baranov Feb 14 '18 at 22:44
0

To get all sales within a specified hours interval:

SELECT SaleId, sum(SaleAmount) as amount FROM Sales WHERE (HOUR(SaleDate) BETWEEN 1 AND 4) GROUP BY SaleId HAVING amount >=1000;

You can add other conditions in WHERE clause.

vpalade
  • 1,427
  • 1
  • 16
  • 20
0

If you're looking for periods like 0:00-3:00, 3:00-6:00, you can group by those intervals. The following query rounds the hour to multiples of three, combines it with the date, and groups on that:

select  format(dt, 'yyyy-mm-dd') + ' ' + 
            cast(datepart(hour, dt) / 3 * 3 as varchar(2)) as period
,       sum(amount) as total
from    sales
group by
        format(dt, 'yyyy-mm-dd') + ' ' + 
            cast(datepart(hour, dt) / 3 * 3 as varchar(2))
having  sum(amount) > 1000

Working example at regtester.

If you're looking for any 3 hour period, like 0:33-3:33 or 12:01-15:01, see Gordon Linoff's answer.

Andomar
  • 232,371
  • 49
  • 380
  • 404