3

I'm trying to do a OHCL sql query (SQL Server 2012) on a StockData table. There are thousands of rows per day added to the table and I want to get the Open, High, Low and Close data for each day.

The create table sql is below:

  CREATE TABLE [dbo].[StockData](
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [StockExchangeID] [bigint] NOT NULL,
    [DataDateTime] [datetime] NOT NULL,
    [ExportCost] [bigint] NOT NULL,
 CONSTRAINT [PK_StockData] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

Using the query below I can get the High, Low and Average but I'm having difficulty getting the Open (earliest ExportCost based on DataDateTime) and Close (latest ExportCost based on DataDateTime) of the ExportCost for the day.

My query so far is below. If there is a more efficient way to do it then any suggestions welcome.

DECLARE @IntervalDays INT, @StartDate DATETIME, @EndDate DATETIME;
SET @IntervalDays = 1;
SET @StartDate = '01/01/2017'
SET @EndDate = '01/01/2018'

-- Create the database table
SELECT 
    StockExchangeID,
    DATEADD(DAY, DATEDIFF(DAY,0,[DataDateTime]) / @IntervalDays * @IntervalDays, 0) AS [DateDay],
    MAX(ExportCost) AS DayHigh, 
    MIN(ExportCost) AS DayLow, 
    AVG(ExportCost) AS DayAverage,
    '??????????' As DayOpen, --Not sure how to get the start price for the day i.e. ExportCost for the first record for that day based on [DataDateTime]
    '??????????' As DatClose
FROM StockData
WHERE [DataDateTime] >= @StartDate AND [DataDateTime] <= @EndDate
GROUP BY DATEADD(DAY, DATEDIFF(DAY,0,[DataDateTime]) / @IntervalDays * @IntervalDays, 0), StockExchangeID
ORDER BY [DateDay]
gisWeeper
  • 501
  • 6
  • 18

2 Answers2

3

For older versions of SQL Server use can use a correlated subquery:

SELECT 
    StockExchangeID,
    DATEADD(DAY, DATEDIFF(DAY,0,[DataDateTime]) / @IntervalDays * @IntervalDays, 0) AS [DateDay],
    MAX(ExportCost) AS DayHigh, 
    MIN(ExportCost) AS DayLow, 
    AVG(ExportCost) AS DayAverage,
    (SELECT TOP 1 DO.ExportCost FROM StockData AS DO WHERE DO.[DataDateTime] = MIN( A.[DataDateTime] ) ORDER BY ID ASC ) As DayOpen, 
    (SELECT TOP 1 DC.ExportCost FROM StockData AS DC WHERE DC.[DataDateTime] = MAX( A.[DataDateTime] ) ORDER BY ID DESC ) As DatClose
FROM StockData AS A
WHERE [DataDateTime] >= @StartDate AND [DataDateTime] <= @EndDate
GROUP BY DATEADD(DAY, DATEDIFF(DAY,0,[DataDateTime]) / @IntervalDays * @IntervalDays, 0), StockExchangeID
ORDER BY [DateDay]

Note: Order By ID is used just in case you have more than 1 record with the same date.

For newer versions, take a look at @Serg comment about FIRST_VALUE

Alex
  • 4,885
  • 3
  • 19
  • 39
  • This provided the correct result - thanks. If there are alot of rows in the table would a CTE or some other alternative query make it more efficient? – gisWeeper Nov 15 '17 at 15:19
  • You biggest performance offender would be `GROUP BY DATEADD(DAY, DATEDIFF(DAY,0,[DataDateTime]) / @IntervalDays * @IntervalDays, 0)`. You still has not replied to my original comment as why you multiply and then divide by the same variable. – Alex Nov 15 '17 at 21:28
  • Alex - I'd copied that piece of code from another query I had that split and grouped the stock times into 15 min intervals for each day. As it is i my current query its doing nothing and I will remove it. – gisWeeper Nov 17 '17 at 08:51
2

Trust me - you might want to keep this level of detail data but you don't want to use it for general reporting purposes. You need to aggregate the pricing to the high/low/close/open values for each day and store them because they don't change once a trading day has finished (well, sometimes they do but it is extremely rare and doesn't change the general approach).

A step further - your choice of primary key and clustered index might not be particularly useful. You almost always search this table for either a given security or a given date (or date range). Clustering on an identity column is likely wasteful. I would also consider separating the date and the time into separate columns. The date part is extremely important - the time part probably isn't used much. If you were to separate the date and time, your queries become much simpler since you simply group by date to find min/max. Finding open and close could be simplified with additional schema changes (or better - with the post trading aggregated table).

SMor
  • 2,830
  • 4
  • 11
  • 14
  • I realise this but for now Im working on a prototype and I need something quick and dirty. We do envisage a scheduled job/service pushing aggregated data to a history table for better querying. Either way I still need the query solved. Thanks everyone! – gisWeeper Nov 15 '17 at 15:22