1

I downloaded some EURUSD tick data (source) and already prepared it, so that I now have a dataframe with the bid and ask prize as well as the datetime. The datetime has been created with

options(digits.secs=3)
data$datetime <- as.POSIXct(strptime(as.character(data$datetime), format="%Y-%m-%d %H:%M:%OS"))

As you may see from the data, each minute can consist of a different number of prize points (ticks).

Now I'd like to calculate a moving average as if the data would have been compressed into a 1-minute chart, i.e. for each tick prize I'd like to calculate the average over the last x minutes, whereas only the current prize as well as the (x-1) last prizes of the (x-1) last minutes should be used. Maybe it is easier to explain with an example. Let's consider this data:

     bid       datetime
[1]  1.21332   2016-03-01 03:21:32.138
[2]  1.21333   2016-03-01 03:21:58.213
[3]  1.21328   2016-03-01 03:22:01.283
[4]  1.21325   2016-03-01 03:22:47.018
[5]  1.21330   2016-03-01 03:23:22.221
[6]  1.21331   2016-03-01 03:24:01.112

For the last data point [6] and a period (x) of 4 I now like to calculate the average of the last elapsed 3 minutes (using the last bid prize for each of these minutes) as well as the current value, i.e.

mean = ([2] + [4] + [5] + [6]) / 4

My approach would be using stacked loops (one to go over each data point and one to determine the last prize changes during the last elapsed (x-1) minutes). However, I'd bet that there is as faster and more efficient way to do this with R.

Thanks a lot in advance for your help!

Anti
  • 365
  • 1
  • 14
  • 1
    what if there are no data in the last 3 minutes? do you use the latest available? for e.g. on the first minute of a Monday where there are no prices over the weekend, do you use prices from close of Friday? – chinsoon12 Aug 16 '18 at 01:52
  • Good question. Thanks for asking! I think it would be wise to use the latest available. And yes, for Monday morning the prizes from the last week should be used. – Anti Aug 16 '18 at 05:47

1 Answers1

1

Assuming that there are always prices for the last 3 minutes (i.e. there are no minutes without any price data), you can first extract the latest price for each minute and do a rolling sum with window of size, x-1. Then, join this table with the original table using a non-equi join while selecting the nearest last 3 minute sum before calculating the desired result for each row in the data.

library(data.table)
setDT(dat)

#find last price for each min
minlastprice <- dat[, c(.(DateTime=DateTime[.N]), .SD[.N]), 
    by=.(as.Date(DateTime), hour(DateTime), minute(DateTime))]

winsize <- 4

#calculate the price in last 3 min
library(RcppRoll)
minlastprice[, SumPrevMin := roll_sumr(Bid, winsize - 1)]

#perform a non-equi join to pick the sum of the last 3 minutes' prices
minlastprice[dat, on=.(DateTime < DateTime), mult="last",
    (SumPrevMin + i.Bid) / winsize, by=.EACHI]
chinsoon12
  • 25,005
  • 4
  • 25
  • 35
  • Could you please show me how I can change the code so that I can calculate the average on tick base, i. e. I'm looking for an extraction of all ticks that occured within the last 3 minutes and calculate the moving average tick-by-tick. Thanks a lot in advance!!! – Anti Jul 29 '19 at 20:45
  • Would like to post another qn? – chinsoon12 Jul 29 '19 at 20:47
  • Ok - just thought it could be added here. I started a new question: https://stackoverflow.com/questions/57261394/r-fast-and-efficient-way-to-calculate-moving-averages-for-last-x-whole-minutes. I would be delighted if you'd answer. Thank you so much in advance! – Anti Jul 29 '19 at 21:06