1

I have a table with historical FX historical 1 minute level data, I would like to select out the 10 minutes data based on the 1 minute level data.

My table looks like below:

    ccy     Time                open     high       low      close
    AUDUSD  2018.03.26 00:00    0.77282  0.77283    0.77265  0.77265
    AUDUSD  2018.03.26 00:01    0.77264  0.77266    0.77261  0.77266
    AUDUSD  2018.03.26 00:02    0.77264  0.77265    0.77259  0.77264
    AUDUSD  2018.03.26 00:03    0.77262  0.77262    0.7726   0.77262
    AUDUSD  2018.03.26 00:04    0.77262  0.77262    0.77262  0.77262
    AUDUSD  2018.03.26 00:05    0.7726   0.7726     0.7726   0.7726
    AUDUSD  2018.03.26 00:06    0.77259  0.77262    0.77259  0.77261
    AUDUSD  2018.03.26 00:07    0.77262  0.77265    0.77262  0.77264
    AUDUSD  2018.03.26 00:08    0.77263  0.77272    0.77262  0.77272
    AUDUSD  2018.03.26 00:09    0.77273  0.77278    0.77271  0.77274
    AUDUSD  2018.03.26 00:10    0.77273  0.77282    0.77271  0.77279
    AUDUSD  2018.03.26 00:11    0.77282  0.77293    0.77281  0.77291
    AUDUSD  2018.03.26 00:12    0.77291  0.77293    0.77287  0.77287
    AUDUSD  2018.03.26 00:13    0.77288  0.77288    0.77288  0.77288
    AUDUSD  2018.03.26 00:14    0.77288  0.77288    0.77277  0.77279
    AUDUSD  2018.03.26 00:15    0.77278  0.77279    0.77255  0.77262
    AUDUSD  2018.03.26 00:16    0.77261  0.77271    0.77261  0.77271
    AUDUSD  2018.03.26 00:17    0.77271  0.77273    0.77264  0.77271
    AUDUSD  2018.03.26 00:18    0.77273  0.77282    0.77273  0.77281
    AUDUSD  2018.03.26 00:19    0.77281  0.77285    0.77281  0.77283

As you can see the timestamp is every 1 minute and above is 20 minutes data sample.

My expect result is:

    ccy     Time                open     high       low      close
    AUDUSD  2018.03.26 00:00    0.77282  0.77283    0.77259  0.77274
    AUDUSD  2018.03.26 00:10    0.77273  0.77293    0.77255  0.77283
  • The open needs to be the open price of every start point of the time span, for example from 2018.03.26 00:00 to 2018.03.26 00:09, the start point is 2018.03.26 00:00, so the expected result open is the open of 2018.03.26 00:00, which is 0.77282.

  • The close needs to be the close price of every end point of the time span, for example from 2018.03.26 00:00 to 2018.03.26 00:09, the end point is 2018.03.26 00:09, so the expected result close is the close of 2018.03.26 00:09, which is 0.77274.

  • The high needs to be the max value of the time span. In the high column, from 2018.03.26 00:00 to 2018.03.26 00:09, the max value is 0.77283, so my expected high is 0.77283.

  • The low needs to be the minimum value of the time span. In the low column, from 2018.03.26 00:00 to 2018.03.26 00:09, the minimum value is 0.77259, so my expected low is 0.77259.

And I hope this can keep looping while I have more data. So for example, if I have 1 hour data, I expected there are 60/10 = 6 rows records when I aggregate the data into 10 minutes data.

I have been working on this for while but cannot find a proper solution, can some one please kindly help? Thank you sooo much!

RRRookie
  • 11
  • 5
  • So you need to see what was min/max/last/first values in specific data window in the past? Could you provide expected result which would be returned by query? – Edgars T. Jun 27 '18 at 07:22
  • The expect result for this would be: 2018.03.26 00:00, 0.77282, 0.77279, 0.77283, 0.77259. If there are 20 mins data, the result should be two rows, the second row should be: 2018.03.26 00:10, 0.77xxx, 0.77xxx, 0.77xxx, 0.77xxx. – RRRookie Jun 27 '18 at 07:25
  • which dbms is used? – kiran gadhe Jun 27 '18 at 07:26
  • I use mysql.... – RRRookie Jun 27 '18 at 07:29
  • What have you tried before? If you're using MS Sql, you could use Row_Number to select the 10th record (or every 10th record) – JonTout Jun 27 '18 at 07:30
  • @JonTout what if there is a data point missing. 00:05 for instance? – Jaques Jun 27 '18 at 07:56
  • @Jacques - well the row_number would have to be divisible by 10 and constructed around the timestamp, but I would have to understand how the timestamp failed and whether the "tenth" record would suffice? – JonTout Jun 27 '18 at 08:38
  • Are the 10 minute intervals at fixed points in the day or from the first transaction? And I don't see the relevance of the 1 minute interval does it really matter if you have 10 transactions in the period or 2 or 100? – P.Salmon Jun 27 '18 at 10:32
  • I updated my answer, check it out. The result is exactly what you are looking for =) – Gus Jun 27 '18 at 10:44

3 Answers3

1

There a number of things which are not clear in the question. This looks like currency conversion data but it's not stated in the question (ccy?), and it looks like you wish to aggregate by ccy but it's not stated in the question, there is no indication of what you wish to pass to the query (interval range, interval type etc). But as a start here's an example of an approach , the key issue I see is getting the first and last values

drop table if exists t;
create table t(ccy varchar(6),Tm datetime,open decimal(10,5),high decimal(10,5), low decimal(10,5), close decimal(10,5));
insert into t values
(    'AUDUSD',  '2018-03-26 00:00'  ,  0.77282 , 0.77283  ,  0.77265 , 0.77265),
(    'AUDUSD',  '2018-03-26 00:01'  ,  0.77264 , 0.77266  ,  0.77261 , 0.77266),
(    'AUDUSD',  '2018-03-26 00:02'  ,  0.77264 , 0.77265  ,  0.77259 , 0.77264),
(    'AUDUSD',  '2018-03-26 00:03'  ,  0.77262 , 0.77262  ,  0.7726  , 0.77262),
(    'AUDUSD',  '2018-03-26 00:04'  ,  0.77262 , 0.77262  ,  0.77262 , 0.77262),
(    'AUDUSD',  '2018-03-26 00:05'  ,  0.7726  , 0.7726   ,  0.7726  , 0.7726),
(    'AUDUSD',  '2018-03-26 00:06'  ,  0.77259 , 0.77262  ,  0.77259 , 0.77261),
(    'AUDUSD',  '2018-03-26 00:07'  ,  0.77262 , 0.77265  ,  0.77262 , 0.77264),
(    'AUDUSD',  '2018-03-26 00:08'  ,  0.77263 , 0.77272  ,  0.77262 , 0.77272),
(    'AUDUSD',  '2018-03-26 00:09'  ,  0.77273 , 0.77278  ,  0.77271 , 0.77274),
(    'AUDUSD',  '2018-03-26 00:10'  ,  0.77273 , 0.77282  ,  0.77271 , 0.77279);

select ccy, (select open from t t1 where t1.ccy = t.ccy order by t1.tm asc limit 1) periodopen,
                (select close from t t1 where t1.ccy = t.ccy order by t1.tm desc limit 1) period_close,
                max(high) period_high,
                min(low) period_low 
from t
where t.tm between '2018-03-26 00:00' and '2018-03-26 00:10:00'
group by ccy;

+--------+------------+--------------+-------------+------------+
| ccy    | periodopen | period_close | period_high | period_low |
+--------+------------+--------------+-------------+------------+
| AUDUSD |    0.77282 |      0.77279 |     0.77283 |    0.77259 |
+--------+------------+--------------+-------------+------------+
1 row in set (0.00 sec)
P.Salmon
  • 17,104
  • 2
  • 12
  • 19
  • Thank you for pointing out the unclear part of my question, I have updated it. Instead of trying to aggregate by currency, I think I am trying to aggregate by time stamp. – RRRookie Jun 27 '18 at 08:47
0

One way to go is to group the records and use functions to retrieve info about each group. The problem is you don't have any column to group by.

In this case, to create groups of records within 10 minutes, you can convert the Time column to a string and truncate out the last digit, for ex. there are 10 records in the table with the same value '2018.03.26 00:0' if you ignore the last digit.

Regarding the open and close columns you need to get the values from first and last rows inside the groups. Mysql don't have a FIRST or LAST aggregate function, but I learned from this answer you can use GROUP_CONCAT to simulate them.

select
    ccy,
    SUBSTRING(DATE_FORMAT(Time, '%Y.%m.%d %H:%i'),1,15) as time_group, 
    SUBSTRING_INDEX(GROUP_CONCAT(CAST(open AS CHAR) ORDER BY Time), ',', 1 ) as open,
    MAX(high) as high, 
    MIN(low) as low,
    SUBSTRING_INDEX(GROUP_CONCAT(CAST(close AS CHAR) ORDER BY Time DESC), ',', 1 ) as close
from table_name
group by ccy, time_group;

this is the result

+--------+-----------------+---------+---------+---------+---------+
| ccy    | time_group      | open    | high    | low     | close   |
+--------+-----------------+---------+---------+---------+---------+
| AUDUSD | 2018.03.26 00:0 | 0.77282 | 0.77283 | 0.77259 | 0.77274 |
| AUDUSD | 2018.03.26 00:1 | 0.77273 | 0.77293 | 0.77255 | 0.77283 |
+--------+-----------------+---------+---------+---------+---------+
Gus
  • 942
  • 9
  • 32
0

This one do the job - tested on mssql (i do not have mysql) but maybe it is helpfull for you. In mysql you must change only TOP 1, DATEPART, and "+" symbol in string concat.

CREATE TABLE [dbo].[T1](
    [F_CCY] [varchar](20) NULL,
    [F_TIME] [datetime] NULL,
    [F_OPEN] [numeric](10, 8) NULL,
    [F_HIGH] [numeric](10, 8) NULL,
    [F_LOW] [numeric](10, 8) NULL,
    [F_CLOSE] [numeric](10, 8) NULL
);

INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26', 0.77282, 0.77283, 0.77265, 0.77265);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:01:00', 0.77264, 0.77266, 0.77261, 0.77266);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:02:00', 0.77264, 0.77265, 0.77259, 0.77264);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:03:00', 0.77262, 0.77262, 0.7726, 0.77262);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:04:00', 0.77262, 0.77262, 0.77262, 0.77262);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:05:00', 0.7726, 0.7726, 0.7726, 0.7726);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:06:00', 0.77259, 0.77262, 0.77259, 0.77261);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:07:00', 0.77262, 0.77265, 0.77262, 0.77264);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:08:00', 0.77263, 0.77272, 0.77262, 0.77272);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:09:00', 0.77273, 0.77278, 0.77271, 0.77274);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:10:00', 0.77273, 0.77282, 0.77271, 0.77279);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:11:00', 0.77282, 0.77283, 0.77265, 0.77265);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:12:00', 0.77264, 0.77266, 0.77261, 0.77266);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:13:00', 0.77264, 0.77265, 0.77259, 0.77264);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:14:00', 0.77262, 0.77262, 0.7726, 0.77262);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:15:00', 0.77262, 0.77262, 0.77262, 0.77262);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:16:00', 0.7726, 0.7726, 0.7726, 0.7726);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:17:00', 0.77259, 0.77262, 0.77259, 0.77261);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:18:00', 0.77262, 0.77265, 0.77262, 0.77264);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:19:00', 0.77263, 0.77272, 0.77262, 0.77272);
INSERT INTO T1 (F_CCY, F_TIME, F_OPEN, F_HIGH, F_LOW, F_CLOSE) VALUES('AUDUSD', '2018-03-26 00:20:00', 0.77273, 0.77278, 0.77271, 0.77274);


SELECT
    T.F_ccy as "ccy"
    , MIN(T.F_TIME) as "Time"
    ,
    (SELECT
        TOP 1 FIRST_VALUE(X.F_OPEN) OVER(ORDER BY X.F_TIME)
    FROM
        T1 X
    WHERE
        X.F_ccy=T.F_ccy
        AND X.F_TIME>=CAST(CAST(CAST(T.F_TIME AS DATE) AS VARCHAR(10)) + ' ' + CAST(DATEPART(HOUR, T.F_TIME) AS VARCHAR(2)) + ':' + CAST((DATEPART(MINUTE, T.F_TIME)/10)*10 AS VARCHAR(2)) AS DATETIME)
        AND X.F_TIME<CAST(CAST(CAST(T.F_TIME AS DATE) AS VARCHAR(10)) + ' ' + CAST(DATEPART(HOUR, T.F_TIME) AS VARCHAR(2)) + ':' + CAST(((DATEPART(MINUTE, T.F_TIME)/10)+1)*10 AS VARCHAR(2)) AS DATETIME)
    ) AS "open"
    ,
    (SELECT
        TOP 1 LAST_VALUE(X.F_OPEN) OVER(ORDER BY X.F_TIME)
    FROM
        T1 X
    WHERE
        X.F_ccy=T.F_ccy
        AND X.F_TIME>=CAST(CAST(CAST(T.F_TIME AS DATE) AS VARCHAR(10)) + ' ' + CAST(DATEPART(HOUR, T.F_TIME) AS VARCHAR(2)) + ':' + CAST((DATEPART(MINUTE, T.F_TIME)/10)*10 AS VARCHAR(2)) AS DATETIME)
        AND X.F_TIME<CAST(CAST(CAST(T.F_TIME AS DATE) AS VARCHAR(10)) + ' ' + CAST(DATEPART(HOUR, T.F_TIME) AS VARCHAR(2)) + ':' + CAST(((DATEPART(MINUTE, T.F_TIME)/10)+1)*10 AS VARCHAR(2)) AS DATETIME)      
        ) AS "close"
    , MAX(T.F_HIGH) as "high"
    , MIN(T.F_LOW) as "low"
FROM
    T1 T
GROUP BY
    T.F_ccy
    , CAST(T.F_TIME AS DATE)
    , DATEPART(HOUR, T.F_TIME)
    , DATEPART(MINUTE, T.F_TIME)/10
Livius
  • 958
  • 1
  • 6
  • 19