0

I have a timeseries in a 15min format with four different variables. I need to convert this into 10min timeseries format. Due to some constraints, I need to do this in SQL which I agree is probably the worse choice language to do this in. I am convinced however that it is doable.

What I believe needs to be done is to divide each 15min data point into three 5min data point and then add every two 5min data point into a 10min data point. Done manually this gives me the correct results.

Below some sample DDL SQL that shows the initial table:

IF OBJECT_ID('tempdb..#Temp15min') IS NOT NULL DROP TABLE #Temp15min

CREATE TABLE #Temp15min
(
    Project_id INT,
    Project_name VARCHAR(25),
    Date_results DATETIME,
    Pplus INT,
    Pminus INT,
    Qplus INT,
    Qminus INT,
    Zone VARCHAR(2)
)

INSERT INTO #Temp15min (Project_id, Project_name, Date_results, Pplus, Pminus, Qplus, Qminus, Zone)
VALUES 
(1, 'USA_Project', '2020-01-09 00:00',9879, 7766, 3886, 6628, 'US'),
(1, 'USA_Project', '2020-01-09 00:15', 15420, 10969, 5743, 9568, 'US'),
(1, 'USA_Project', '2020-01-09 00:30', 8763, 6602, 5475, 512, 'US'),
(1, 'USA_Project', '2020-01-09 00:45', 6998, 1083, 277, 13360, 'US'),
(1, 'USA_Project', '2020-01-09 01:00', 14356, 275, 1580, 14184, 'US'),
(1, 'USA_Project', '2020-01-09 01:15', 22983, 6161, 7815, 6917, 'US'),
(1, 'USA_Project', '2020-01-09 01:30', 7653, 5080, 2742, 8086, 'US'),
(1, 'USA_Project', '2020-01-09 01:45', 276, 14977, 10318, 3485, 'US')

The end result table:

IF OBJECT_ID('tempdb..#Temp10min') IS NOT NULL DROP TABLE #Temp10min

CREATE TABLE #Temp10min
(
    Project_id INT,
    Project_name VARCHAR(25),
    Date_results DATETIME,
    Pplus INT,
    Pminus INT,
    Qplus INT,
    Qminus INT,
    Zone VARCHAR(2)
)

INSERT INTO #Temp10min (Project_id, Project_name, Date_results, Pplus, Pminus, Qplus, Qminus, Zone)
VALUES 
(1, 'USA_Project', '2020-01-09 00:00', 6586, 5177, 2591, 4419, 'US'),
(1, 'USA_Project', '2020-01-09 00:10', 8433, 6245, 3210, 5399, 'US'),
(1, 'USA_Project', '2020-01-09 00:20', 10280, 7313, 3829, 6379, 'US'),
(1, 'USA_Project', '2020-01-09 00:30', 5842, 4401, 3650, 341, 'US'),
(1, 'USA_Project', '2020-01-09 00:40', 5254, 2562, 1917, 4624, 'US'),
(1, 'USA_Project', '2020-01-09 00:50', 4665, 722, 185, 8907, 'US'),
(1, 'USA_Project', '2020-01-09 00:00', 9571, 183, 1053, 9456, 'US'),
(1, 'USA_Project', '2020-01-09 00:10', 12446, 2145, 3132, 7034, 'US'),
(1, 'USA_Project', '2020-01-09 00:20', 15322, 4107, 5210, 4611, 'US'),
(1, 'USA_Project', '2020-01-09 00:30', 5102, 3387, 1828, 5391, 'US'),
(1, 'USA_Project', '2020-01-09 00:40', 2643, 6686, 4353, 3857, 'US'),
(1, 'USA_Project', '2020-01-09 00:50', 184, 9985, 6879, 2323, 'US')

A sanity check to make sure numbers are correct from start to finish:

SELECT Project_id, Project_name, SUM(Pplus) PplusTotal, SUM(Pminus) PminusTotal, Sum(Qplus) QplusTotal, Sum(Qminus) QminusTotal, Zone FROM #Temp15min GROUP BY Project_id, Project_name, Zone
UNION
SELECT Project_id, Project_name, SUM(Pplus), SUM(Pminus), Sum(Qplus), Sum(Qminus), Zone FROM #Temp10min GROUP BY Project_id, Project_name, Zone

Graphically the conversion goes something like this:

  • Left table is the 15min data points
  • Middle table is the 5min data points
  • Right table is the 10min data points

15min to 5min to 10min

I understand the complexity of the request so any help is greatly appreciated !

amphinomos
  • 37
  • 6

2 Answers2

1

Based on what you describe, you can "unpivot" for each of the minutes and then reaggregate:

select Project_id, Project_name, min(dateadd(minute, v.minutes, t.dateresults)) as dt,
       t.zone,
       sum(Pplus / 3.0),
       sum(Pminus / 3.0),
       sum(Qplus / 3.0),
       sum(Qminus / 3.0)
from #Temp15min t cross join
     (values (0), (5), (10)) v(minutes)
group by convert(date, dateadd(minute, v.minutes, t.dateresults)),
         datepart(hour, dateadd(minute, v.minutes, t.dateresults)),
         datepart(minute, dateadd(minute, v.minutes, t.dateresults)) / 6,
         Project_id, Project_name, zone;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I should've specified I'm using SQL Server so I can't use the date function as is. I assumed I could convert it to `group by CONVERT(datetime,(dateadd(minute, v.minutes, t.dateresults)))` but I'm getting the following error : 'v' has more columns than were specified in the column list. – amphinomos Sep 22 '20 at 07:31
  • @amphinomos . . . The date functions are fine. There was a typo in the `value` clause. – Gordon Linoff Sep 22 '20 at 10:37
  • I get a 'date' is not a recognized built-in function name. I should also maybe add that I'm using 2008 R2...we should be upgrading soon. – amphinomos Sep 22 '20 at 12:49
  • 1
    @amphinomos . . . Oops, I missed that. I fixed that to a `convert()`. – Gordon Linoff Sep 22 '20 at 13:08
  • Still not getting exactly the result I'm looking for. Thanks for the help so far. While rerunning the exercise again, I realized there's an added constraint due to the nature of the data in kWh. When converting 15min data into 5 min data and 10min data, the actual data needs to be massaged. The first 15min data point 9879 becomes the first 5min data point (9879/3) + (9879/3) = 6586 which becomes the first 10min data point (6586+6586)-(6586+6586)/4 = 9879. This convoluted calculation allows for the raw total to be different but when divided by 4, 8 and 6 respectively to be equal. – amphinomos Sep 22 '20 at 13:56
  • I would suggest that you ask a *new* question with improved details on this split. This question has been answered. – Gordon Linoff Sep 22 '20 at 15:20
  • Done. Cleaned up this post. Other post is here: https://stackoverflow.com/questions/64013194/convert-15min-to-10min-timeseries-with-hour-constraint-in-sql-server – amphinomos Sep 22 '20 at 15:52
0

Based on Gordon Linoff's answer as well as this other stackoverflow post and adapted to SQL Server, I used this code :


IF OBJECT_ID('tempdb..#Tempminutes') IS NOT NULL DROP TABLE #Tempminutes

CREATE TABLE #Tempminutes
(
    Minutes INT
)

INSERT INTO #Tempminutes VALUES (0),(5),(10)



select Project_id, Project_name, min(dateadd(minute, v.minutes, t.date_results)) as dt,
       t.zone,
       sum(Pplus / 3.0),
       sum(Pminus / 3.0),
       sum(Qplus / 3.0),
       sum(Qminus / 3.0)
from #Temp15min t cross join
     #Tempminutes v
GROUP BY
        DATEPART(YEAR, dateadd(minute, v.minutes, t.date_results)),
        DATEPART(MONTH, dateadd(minute, v.minutes, t.date_results)),
        DATEPART(DAY, dateadd(minute, v.minutes, t.date_results)),
        DATEPART(HOUR, dateadd(minute, v.minutes, t.date_results)),
        (DATEPART(MINUTE, dateadd(minute, v.minutes, t.date_results)) / 10),
        Project_id, Project_name, zone
amphinomos
  • 37
  • 6