I've got some financial time series in one-minute intervals ("bars"). Because I'll be dealing with some large series, I'm learning to use data.table
, and I'm not quite used to thinking in the data.table
way yet.
In the data I've imported into a data.table
(let's call it DT
here for simplicity), among other things not relevant to this question, I've got a column called Date
in YYYYMMDD
format, a column called Time
in HH:MM
format, and a column called Price
(let's say it's the final price for that minute) that's an integer. I'm simplifying a little, but if anyone can help me with the question I'm asking here, I can adapt it to the details of my data. Here's an example:
Date Time Price
20151028 09:00 47675
20151028 09:01 47650
20151028 09:02 47670
20151028 09:03 47685
20151028 09:04 47690
...
What I'd like to do is calculate, for each minute, the maximum and minimum values of Price
in the day up to that minute. Let's go ahead and add the things I want to calculate as additional columns, which I'll call DayMax
and DayMin
here. The result would then look like this:
Date Time Price DayMax DayMin
20151028 09:00 47675 47675 47675
20151028 09:01 47650 47675 47650
20151028 09:02 47670 47675 47650
20151028 09:03 47685 47685 47650
20151028 09:04 47690 47690 47650
...
I'm pretty sure my by
(which I understand as being like a GROUP BY
in SQL) should be Date
, and I think what I'm having trouble understanding is how to set up my i
(like a WHERE
in SQL), basically because I'm not sure how to use the same column two different ways in the syntax. For each row, I want to calculate the maximum where (and by "where," I think I mean WHERE
) Time
is less than or equal to the value of Time
in that specific row, and where I'm GROUP
(ing)BY
Date, so it's up to that line's time in the specific day. How can I do this in a quick and memory-efficient way using data.table
?
By the way, speaking of "quick and memory-efficient," I'm assuming I should use the :=
operator to create the new columns. Please correct me if I'm wrong about that.