1

I am trying to find the first value and the last value in a group. Something like First([Open]), Max([High]), Min([Low]), Last([Close])

One of the queries is below (it is currently missing the logic for the open/close columns. The dataset is extremely large (over 150 million records), so performance of query may become an issue.

Select 'AUDCHF' AS CURRENCY,
    Datepart(year, Datekey) AS [YEAR],  
    Datepart(month, Datekey) AS [MONTH], 
    Datepart(day, Datekey) AS [DAY], 
    Case When Datepart(hour, Datekey) BETWEEN 0 AND 11 Then 'AM' Else 'PM' End AS [12 Hour], 
    Case 
        When Datepart(hour, Datekey) BETWEEN 0 AND 3 Then '1st 4 Hours'
        When Datepart(hour, Datekey) BETWEEN 4 AND 7 Then '2nd 4 Hours'
        When Datepart(hour, Datekey) BETWEEN 8 AND 11 Then '3rd 4 Hours'
        When Datepart(hour, Datekey) BETWEEN 12 AND 15 Then '4th 4 Hours'
        When Datepart(hour, Datekey) BETWEEN 16 AND 19 Then '5th 4 Hours'
        Else '6th 4 Hours'
    End AS [4 Hours], 
    Datepart(hour, Datekey) AS [HOUR], 
    max(High) AS HIGH, 
    min(Low) AS LOW
From AUDCHF
    Group by Datepart(year, Datekey), Datepart(month, Datekey), Datepart(day, Datekey), 
        Case When Datepart(hour, Datekey) BETWEEN 0 AND 11 Then 'AM' Else 'PM' End,
        Case 
            When Datepart(hour, Datekey) BETWEEN 0 AND 3 Then '1st 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 4 AND 7 Then '2nd 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 8 AND 11 Then '3rd 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 12 AND 15 Then '4th 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 16 AND 19 Then '5th 4 Hours'
            Else '6th 4 Hours'
        End, 
        Datepart(hour, Datekey)
    Order by Datepart(year, Datekey),  Datepart(month, Datekey), Datepart(day, Datekey), 
        Case When Datepart(hour, Datekey) BETWEEN 0 AND 11 Then 'AM' Else 'PM' End,
        Case 
            When Datepart(hour, Datekey) BETWEEN 0 AND 3 Then '1st 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 4 AND 7 Then '2nd 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 8 AND 11 Then '3rd 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 12 AND 15 Then '4th 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 16 AND 19 Then '5th 4 Hours'
            Else '6th 4 Hours'
        End, 
        Datepart(hour, Datekey)
har07
  • 88,338
  • 12
  • 84
  • 137
Takkelete
  • 11
  • 2

2 Answers2

1

The ORDER BY can use aliases defined in the SELECT expression list, because it is evaluated after the SELECT part (this is not the case for the GROUP BY part).

In your query the order by clause could be :

Order by [YEAR],  [MONTH], [DAY], [4 Hours],[HOUR]

Since you are grouping on year/month/day/4hour/4 hour, I think you can remove the 4hour part.

I would use window functions and do an outer select with GROUP BY to remove duplicates.

select [YEAR], [MONTH], [DAY], [HOUR], [12 Hour], [4 Hours],
    max([HIGH]) as HIGH, min([LOW]) as LOW,
    max([Open]) as [Open], max([Close]) as [Close]
from (
    select 
        Datepart(year, Datekey) AS [YEAR],  
        Datepart(month, Datekey) AS [MONTH], 
        Datepart(day, Datekey) AS [DAY], 
        Datepart(hour, Datekey) AS [HOUR], 
        Case When Datepart(hour, Datekey) BETWEEN 0 AND 11 Then 'AM' Else 'PM' End AS [12 Hour], 
        Case 
            When Datepart(hour, Datekey) BETWEEN 0 AND 3 Then '1st 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 4 AND 7 Then '2nd 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 8 AND 11 Then '3rd 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 12 AND 15 Then '4th 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 16 AND 19 Then '5th 4 Hours'
            Else '6th 4 Hours'
        End AS [4 Hours], 
        max(High) over( 
            partition by
                Datepart(year, Datekey) ,  
                Datepart(month, Datekey) , 
                Datepart(day, Datekey), 
                Datepart(hour, Datekey) 
            ) as [HIGH], 
        min(Low) over( 
            partition by
                Datepart(year, Datekey) ,  
                Datepart(month, Datekey), 
                Datepart(day, Datekey), 
                Datepart(hour, Datekey) 
            ) as [LOW],
        first_value([Open]) over( 
            partition by
                Datepart(year, Datekey) ,  
                Datepart(month, Datekey), 
                Datepart(day, Datekey), 
                Datepart(hour, Datekey) 
            order by
                Datepart(year, Datekey) ,  
                Datepart(month, Datekey), 
                Datepart(day, Datekey), 
                Datepart(hour, Datekey) 
            ) as [Open],
        last_value([Close]) over( 
            partition by
                Datepart(year, Datekey) ,  
                Datepart(month, Datekey), 
                Datepart(day, Datekey), 
                Datepart(hour, Datekey) 
            order by
                Datepart(year, Datekey) ,  
                Datepart(month, Datekey), 
                Datepart(day, Datekey), 
                Datepart(hour, Datekey) 
            ) as [Close]


    from AUDCHF ) T
group by [YEAR], [MONTH], [DAY], [HOUR], [12 Hour], [4 Hours]

The outer max(High) , Min(Low) etc is here just to make GROUP BY happy, they are not really meaningfull here since they have been processed in the inner select (I don't know what Open and Close are so I just put first and last values with same partitionning).

If this query have to run on a big table and because there is no WHERE clause to reduce selected rows, I would create an index on Datekey, including High and Low columns (and other columns not in the query: Close etc) to avoid a complete table scan. It will result in a complete index scan, and that could be a lot faster:

create  nonclustered index IxAudchf on AUDCHF(Datekey) include( [High], [Low], [Open], [Close]) ;

For Sql Window functions you can find presentations here and here

Note: FIRST_VALUE and LAST_VALUE is Sql2012 only, not 2008.

If you are running SQL 2005 or 2008, the following should do the same (probably less efficient). I took Low and Close in the last row and I'm not sure that it is what you want, change it to follow your logic if I misunderstood.

; WITH 
WAUDCHF1 as
(   select 
        row_number() over( 
            partition by
                Datepart(year, Datekey), Datepart(month, Datekey) , 
                Datepart(day, Datekey), Datepart(hour, Datekey) 
            order by Datepart(year, Datekey) , Datepart(month, Datekey) , 
                Datepart(day, Datekey), Datepart(hour, Datekey) 
            ) as [Rownum], 
        Datepart(year, Datekey) AS [YEAR],  
        Datepart(month, Datekey) AS [MONTH], 
        Datepart(day, Datekey) AS [DAY], 
        Datepart(hour, Datekey) AS [HOUR], 
        Case When Datepart(hour, Datekey) BETWEEN 0 AND 11 Then 'AM' Else 'PM' End AS [12 Hour], 
        Case 
            When Datepart(hour, Datekey) BETWEEN 0 AND 3 Then '1st 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 4 AND 7 Then '2nd 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 8 AND 11 Then '3rd 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 12 AND 15 Then '4th 4 Hours'
            When Datepart(hour, Datekey) BETWEEN 16 AND 19 Then '5th 4 Hours'
            Else '6th 4 Hours'
        End AS [4 Hours], 
        max(High) over( 
            partition by
                Datepart(year, Datekey) , Datepart(month, Datekey) , 
                Datepart(day, Datekey), Datepart(hour, Datekey) 
            ) as [HIGH], 
        min(Low) over( 
            partition by
                Datepart(year, Datekey) ,  Datepart(month, Datekey), 
                Datepart(day, Datekey), Datepart(hour, Datekey) 
            ) as [LOW],
        [Open],
        [Close]
    from AUDCHF ),
LASTROWNUM as (
    select [YEAR], [MONTH], [DAY], [HOUR], max(rownum) as [Rownum]
    from WAUDCHF1
    group by [YEAR], [MONTH], [DAY], [HOUR], [12 Hour], [4 Hours]
     )
select W1.[YEAR], W1.[MONTH], W1.[DAY], W1.[HOUR], 
    max(W1.[High]) as [High], min(W2.[Low]) as [Low], 
    max(W1.[Open]) as [Open], max(w2.[Close]) as [Close]
from LASTROWNUM M
inner join WAUDCHF1 W1 on M.[YEAR] = W1.[YEAR]
            and  M.[MONTH]= W1.[MONTH]
            and  M.[DAY] =  W1.[DAY]  
            and  M.[HOUR] = W1.[HOUR]           
inner join WAUDCHF1 W2 on W2.[YEAR] = M.[YEAR]
            and  W2.[MONTH]= M.[MONTH]
            and  W2.[DAY] =  M.[DAY]  
            and  W2.[HOUR] = M.[HOUR] 

            and  W2.Rownum = M.Rownum
Where W1.Rownum = 1 
group by W1.[YEAR], W1.[MONTH], W1.[DAY], W1.[HOUR], w1.[12 Hour], W1.[4 Hours]
order by W1.[YEAR], W1.[MONTH], W1.[DAY], W1.[HOUR], w1.[12 Hour], W1.[4 Hours]
ARA
  • 1,296
  • 10
  • 18
  • Thank you very much, was almost exactly what I needed. I was able to get it working off of this. I knew about the first/last values but didn't know how to implement it. The index helped, but processing is still slow; however, this query only needs to be run once over the 1.1 billion records so it shouldn't be a huge issue. I'm letting it spin now. – Takkelete Dec 18 '13 at 07:26
0

The query:

SELECT 'AUDCHF' AS CURRENCY,
    Datepart(year, Datekey) AS [YEAR], Datepart(month, Datekey) AS [MONTH], 
    Datepart(day, Datekey) AS [DAY], [12 Hour], [4 Hours],
    Datepart(hour, Datekey) AS [HOUR], High AS HIGH, Low AS LOW,
    (SELECT High FROM Rate AS R WHERE R.Datekey = (SELECT MIN(Datekey) 
            FROM Rate WHERE DATEADD(hour, DATEDIFF(hour, 0, Rate.Datekey), 0) =
                AUDCHF.Datekey AND Rate.Base = 'AUD' AND Rate.Target = 'CHF') 
            AND R.Base = 'AUD' AND R.Target = 'CHF') AS [Open],
    (SELECT Low FROM Rate AS R WHERE R.Datekey = (SELECT MAX(Datekey) 
            FROM Rate WHERE DATEADD(hour, DATEDIFF(hour, 0, Rate.Datekey), 0) =
                AUDCHF.Datekey AND Rate.Base = 'AUD' AND Rate.Target = 'CHF') 
            AND R.Base = 'AUD' AND R.Target = 'CHF') AS [Close]
FROM AUDCHF, Segment
    WHERE Segment.Hour = Datekey
    ORDER BY Datepart(year, Datekey),  Datepart(month, Datekey), 
        Datepart(day, Datekey), Datepart(hour, Datekey);

will return the results you expect. I have also extracted the case statements into a support table that you can see at SQLFiddle. The extract also the results of your query on some test data. This uses the answer in T-SQL datetime rounded to nearest minute and nearest hours with using functions to truncate the times to hours.

Basically, the view AUDCHF converts truncates the Datekey and performs the grouping. The query then joins this with the Segment table to extract the constant strings and calculate the initial and final values. These need to be in sub-queries as they are not related to the aggregates.

You will, of course, need to include an index on the table to maintain performance. If you keep no other data in the primary table, or create a custom index, then a large proportion should be cached.

Since the data is all historical, you could also prepare a materialised view for fast reference.

The currency pair treatment is partial and would be better handled in the top level view to avoid the repeated constants. It shows how you could consolidate the rates into a single table to simplify adding new rate pairs.

Community
  • 1
  • 1
Pekka
  • 3,529
  • 27
  • 45