Date Range for set of same data

Question

I am trying to build a SQL query which will give me the date range for the dates with same prices. If there is a break in the prices, I expect to see it in a new line. Even if sometime during the month there are same prices, if there is change in the prices sometime in between I want to see it as two separate rows with the specific date range.

Sample Data:

Date    Price
1-Jan   3.2
2-Jan   3.2
3-Jan   3.2
4-Jan   3.2
5-Jan   3.2
6-Jan   3.2
7-Jan   3.2
8-Jan   3.2
9-Jan   3.5
10-Jan  3.5
11-Jan  3.5
12-Jan  3.5
13-Jan  3.5
14-Jan  4.2
15-Jan  4.2
16-Jan  4.2
17-Jan  3.2
18-Jan  3.2
19-Jan  3.2
20-Jan  3.2
21-Jan  3.2
22-Jan  3
23-Jan  3
24-Jan  3
25-Jan  3
26-Jan  3
27-Jan  3
28-Jan  3
29-Jan  3.5
30-Jan  3.5
31-Jan  3.5

Desired Result :

Price   Date Range
3.2     1-8
3.5     9-13
4.2     14-16
3.2     17-22
3       22-28
3.5     29-31

What have you tried so far? are your dates always consecutive or can you have missing dates in your table? — Zohar Peled, May 26 '15 at 09:59
I need the result in a view, it will act as a dataset for reporting. This is part of the code where i am having trouble building, I was planning on having a function, which would compare the prices to the price of the previous data record.For all same set of data i would have it return a single integer value. ie all consecutive dates with same pricing will have the function return 1 and then 2 and so on. I intended to call this function in the select statement of the view, but since SQL does not support user defined Global variables, i could not make reference to an integer value maintained — Dfangs, May 26 '15 at 10:01
@ Ajay2707 SSMS 2012 , @Pravin Deshmukh : Date is dateime, and price is decimal. — Dfangs, May 26 '15 at 10:06
possible duplicate of [How do I group on continuous ranges](http://stackoverflow.com/questions/5662545/how-do-i-group-on-continuous-ranges) — Eric, May 26 '15 at 10:19

PerformanceDBA · Accepted Answer · 2015-05-28T13:20:56.663

Non-relational Solution

I don't think any of other answers are correct.

GROUP BY won't work
Using ROW_NUMBER() forces the data into a Record Filing System structure, which is physical, and then processes it as physical records. At a massive performance cost. Of course, in order to write such code, it forces you to think in terms of RFS instead of thinking in Relational terms.
Using CTEs is the same. Iterating through the data, especially data that does not change. At a slightly different massive cost.
Cursors are definitely the wrong thing for a different set of reasons. (a) Cursors require code, and you have requested a View (b) Cursors abandon the set-processing engine, and revert to row-by-row processing. Again, not required. If a developer on any of my teams uses cursors or temp tables on a Relational Database (ie. not a Record Filing System), I shoot them.

Relational Solution

Your data is Relational, logical, the two given data columns are all that is necessary.
Sure, we have to form a View (derived Relation), to obtain the desired report, but that consists of pure SELECTs, which is quite different to processing (converting it to a file, which is physical, and then processing the file; or temp tables; or worktables; or CTEs; or ROW_Number(); etc).
Contrary to the lamentations of "theoreticians", who have an agenda, SQL handles Relational data perfectly well. And you data is Relational.

Therefore, maintain a Relational mindset, a Relational view of the data, and a set-processing mentality. Every report requirement over a Relational Database can be fulfilled using a single SELECT. There is no need to regress to pre-1970 ISAM File handling methods.

I will assume the Primary Key (the set of columns that give a Relational row uniqueness) is Date, and based on the example data given, the Datatype is DATE.

Try this:

    CREATE VIEW MyTable_Base_V          -- Foundation View
    AS
        SELECT  Date,
                Date_Next,
                Price
            FROM (
            -- Derived Table: project rows with what we need
            SELECT  Date,
                    [Date_Next] = DATEADD( DD, 1, O.Date ),
                    Price,
                    [Price_Next] = (

                SELECT Price            -- NULL if not exists
                    FROM MyTable
                    WHERE Date = DATEADD( DD, 1, O.Date )
                    )

                FROM MyTable MT

                ) AS X
            WHERE Price != Price_Next   -- exclude unchanging rows
    GO

    CREATE VIEW MyTable_V               -- Requested View
    AS
        SELECT  [Date_From] = (
            --  Date of the previous row
            SELECT MAX( Date_Next )     -- previous row
                FROM MyTable_V
                WHERE Date_Next < MT.Date
                ),

                [Date_To] = Date,       -- this row
                Price
            FROM MyTable_Base_V MT
    GO

    SELECT  *
        FROM MyTable_V
    GO

Method, Generic

Of course this is a method, therefore it is generic, it can be used to determine the From_ and To_ of any data range (here, a Date range), based on any data change (here, a change in Price).

Here, your Dates are consecutive, so the determination of Date_Next is simple: increment the Date by 1 day. If the PK is increasing but not consecutive (eg. DateTime or TimeStamp or some other Key), change the Derived Table X to:

    -- Derived Table: project rows with what we need
    SELECT  DateTime,
            [DateTime_Next] = (
            -- first row > this row
        SELECT  TOP 1
                DateTime                -- NULL if not exists
            FROM MyTable
            WHERE DateTime > MT.DateTime
            ),

            Price,
            [Price_Next] = (
            -- first row > this row
        SELECT  TOP 1
                Price                   -- NULL if not exists
            FROM MyTable
            WHERE DateTime > MT.DateTime
            )

        FROM MyTable MT

Enjoy.

Please feel free to comment, ask questions, etc.

This does seems to be best method for SQL Server. I have tried your two views solution and mine CTE based, given below. Your two views solutions did not returned results till 4 minutes and then I cancelled the query. But CTE is returning 972762 rows out of total 2989355 rows within 18 seconds. — Anil, Jun 03 '15 at 06:39
I have compared both plans, in Views based solution hash mach (Inner join) is costing 79% and two time Clustered Index scan is costing 18%. In CTE hash match has only 9% and sort 24% . Please suggest where am i lacking in suggesting a CTE based solution for this problem. As per my knowledge CTE can be slower when there are very very huge data and it will be written to files as memory will not be enough. But here one row for one day does not seems to be this much huge data. — Anil, Jun 03 '15 at 06:39
Another issue i noticed with two views solution is that first from date is null! — Anil, Jun 03 '15 at 06:44
@AnilKumar. (a) If the first FromDate is null, you are **evidently** not using the code I have given. (b) Date is unique here. (c) this "works perfectly" for Dfangs, and I have over 120 such code segments in production, which are working perfectly millions of times per day. (d) This method is known to be much faster than CTEs because CTEs have a massive resource cost, which you seem to agree with. (e) if you have code that you wish to examine, I suggest you open a new question & post all your code (DDL & DML). — PerformanceDBA, Jun 07 '15 at 00:47

score 0 · Answer 2 · answered May 26 '15 at 10:43

0

You can do this by adding a grouping column. A neat trick for this is the difference of two sequences of numbers -- when the difference is constant, then the price is the same.

select price, min(date), max(date)
from (select s.*,
             (row_number() over (order by date) - 
              row_number() over (partition by price order by date)
             ) as grp
      from sample s
     ) grp
group by grp, price;

Note: be careful that price is stored as a fixed decimal rather than a floating decimal. Otherwise, values that look the same might not actually be the same.

answered May 26 '15 at 10:43

Gordon Linoff

1,242,037
58
646
786

1

Could you please confirm, does this handle the fact that a price can occur in more than one group, eg 3.5 ? I can't try it, as I do not have `ROW_NUMBER()` on my platform. Besides, I don't have access to my platform right now anyway. – PerformanceDBA May 27 '15 at 12:49
@PerformanceDBA . . . Try running it. It definitely works on the same price appearing in multiple groups. Did you downvote the answer without testing it? – Gordon Linoff May 27 '15 at 12:59
1

*Test* I just explained that I couldn't test, and the reasons why. Which is why I asked the question. To understand your code/platform, which has functions for Record Filing systems. Particularly the "neat trick". Given your other answers, I was pretty sure it worked. *Downvote* No! – PerformanceDBA May 27 '15 at 13:15

ThePravinDeshmukh · Answer 3 · 2015-05-26T10:32:48.093

-1

This is what you are looking for

declare @temptbl table (price decimal(18,2), mindate date, maxdate date)

declare @price as decimal(18,2), @date as date

declare tempcur cursor for 
select price, date
from YourTable

open tempcur

fetch next from tempcur
into @price, @date

while (@@fetch_status = 0)
begin
    if (isnull((select price from @temptbl where maxdate = (select max(maxdate)from @temptbl)),0) <> @price)
        insert into @temptbl (price,mindate,maxdate) values (@price,@date,@date)
    else
        update @temptbl
        set maxdate = @date
        where maxdate = (select max(maxdate)from @temptbl)


    fetch next from tempcur
    into @price, @date
end

deallocate tempcur

select price, convert(nvarchar(50), mindate) + ' to ' + convert(nvarchar(50), maxdate) as [date range] from @temptbl

edited May 26 '15 at 10:32

answered May 26 '15 at 10:06

ThePravinDeshmukh

1,823
12
21

He needs a view, I need the result in a view! – Anil May 26 '15 at 11:02
This can not be done without using a temporary table or CTE to my knowledge, so need to do this in a procedure – ThePravinDeshmukh May 26 '15 at 12:09
Yes, CTE is the solution, a partitioned view, i have added my answer below. – Anil May 26 '15 at 12:15

score -2 · Answer 4 · answered May 26 '15 at 11:00

-2

Use CTE, below is working code.

WITH grouped AS (
SELECT 
Pricedate, price,
grp1= ROW_NUMBER() OVER (ORDER BY Pricedate) -
ROW_NUMBER() OVER (Partition by price ORDER BY Pricedate)
FROM yourTablewithDateAndPrice 
)
SELECT
  DtFrom = MIN(Pricedate),
  DtTo = MAX(Pricedate),
  Price = price

FROM grouped
GROUP BY Price,grp1 
order by DtFrom;

The internal query will created same group till the time it find same price, else group will be incremented by one. in Final group by you will have required result.

answered May 26 '15 at 11:00

Anil

3,722
2
24
49

1

(a) I didn't vote you down *before*, but please be advised, most people here at SO are not that polite, they won't explain why they did it. [**Comments Help**](http://stackoverflow.com/help/privileges/comment). Few do. Here is an example (b) I am voting you down *now* because while your code produces the desired result, [read my answer for details] it (i) fails to understand the data. (ii) reduces Relational data to a Record Filing System, and then (iii) uses a RFS method that is (iv) massively inefficient (v) instead of using the Relational data as is. – PerformanceDBA May 27 '15 at 12:23
This Relational data solution does not seems to be best method for SQL Server. I have tried your two views solution and mine CTE based, given above. Your two views solutions did not returned results till 4 minutes and then I cancelled the query. But CTE is returning 972762 rows out of total 2989355 rows within 18 seconds – Anil Jun 03 '15 at 06:41
I have compared both plans, in Views based solution hash mach (Inner join) is costing 79% and two time Clustered Index scan is costing 18%. In CTE hash match has only 9% and sort 24% . Please suggest where am i lacking in suggesting a CTE based solution for this problem. As per my knowledge CTE can be slower when there are very very huge data and it will be written to files as memory will not be enough. But here one row for one day does not seems to be this much huge data. – Anil Jun 03 '15 at 06:41

Date Range for set of same data

4 Answers4

Non-relational Solution

Relational Solution

Method, Generic

Linked