2

I have 1 minute intraday price data which has missing data points. As such I want to fill them.

I read through the suggestions in the following post and tried a similar procedure: R: Filling missing dates in a time series?

In my case the missing data point is the first trade i.e. at 09:31:00.

> head(s)
                    AMR.Open AMR.High AMR.Low AMR.Close AMR.Volume AMR.WAP AMR.hasGaps AMR.Count
2010-09-10 09:32:00     6.08     6.10    6.07      6.10        298   6.087           0        39
2010-09-10 09:33:00     6.10     6.14    6.10      6.14        274   6.122           0        70
2010-09-10 09:34:00     6.14     6.15    6.13      6.13        472   6.133           0        96
2010-09-10 09:35:00     6.13     6.14    6.13      6.13        291   6.133           0        68
2010-09-10 09:36:00     6.13     6.13    6.11      6.11        548   6.123           0        97
2010-09-10 09:37:00     6.11     6.11    6.11      6.11         67   6.110           0        26

> na.locf(s, xout=seq(as.POSIXct(head(index(s), 1) - 60), as.POSIXct(tail(index(s), 1)), by="1 min")) -> ss

> head(ss)
                    AMR.Open AMR.High AMR.Low AMR.Close AMR.Volume AMR.WAP AMR.hasGaps AMR.Count
2010-09-10 09:32:00     6.08     6.10    6.07      6.10        298   6.087           0        39
2010-09-10 09:33:00     6.10     6.14    6.10      6.14        274   6.122           0        70
2010-09-10 09:34:00     6.14     6.15    6.13      6.13        472   6.133           0        96
2010-09-10 09:35:00     6.13     6.14    6.13      6.13        291   6.133           0        68
2010-09-10 09:36:00     6.13     6.13    6.11      6.11        548   6.123           0        97
2010-09-10 09:37:00     6.11     6.11    6.11      6.11         67   6.110           0        26

As you can see above the object returned is not filled as desired.

Below you can see that I correctly specified the start and end times.

> as.POSIXct(head(index(s), 1) - 60)
[1] "2010-09-10 09:31:00 EDT"

> as.POSIXct(tail(index(s), 1))
[1] "2010-09-10 16:00:00 EDT"
> 

Could this be because the date range has a time-zone specified whereas the original POSIX index does not? I tried to remove the tz by specifiying tz="" but that does not remove it. That being said, the time-zone may be just a red herring.

I saved the data in rda (binary) format if anyone is interested in testing:

http://www.speedyshare.com/files/28576853/test.rda

Appreciate the help.

Community
  • 1
  • 1
codingknob
  • 11,108
  • 25
  • 89
  • 126
  • I'm really not so sure what you are asking for here...I downloaded your data and looked at the index and every time is there at every consecutive interval. 389 values for the `AMR` columns and 389 for the `attr(,"index")`. I see no missing data points. Now, this "missing first point at 9:31", do you have data for it? I mean all you would have to do is shift the indices of index and the attributes by one and fill in the first one. If that's the case I can help you with that no problem. Otherwise I really don't see what your issue is.... – msikd65 May 22 '11 at 06:35
  • Hi msikd65 - thank you for the response. The missing data is the one that occurs at 9:31. Since I don't have the data for that time I would like to copy the data as it exists at 9:32 and insert it into the 9:31 time slot. This way I can make the time series regular and consistent with every other non-half day trading day and across stock prices. I want to perform various analytics that require regular time series. Appreciate the help. – codingknob May 22 '11 at 17:07
  • what form is your data in? I'm not very familiar with this .rda I assume that there exists a data frame (maybe called 's'?) with all the AMR data? – msikd65 May 22 '11 at 17:24
  • well now that I look at it you are using zoo and xts which I am not familiar with at all. I could propose a very ugly work around but it depends on the data being in data frames.... – msikd65 May 22 '11 at 17:35
  • Hi msikd65 - thanks for taking the time with this. I'm sorry for the inconvenience. I am using object type xts given that I use functions from various packages that require the data to be type xts. The problem at hand seems fairly straight forward on the surface and the link I supplied in the body of my post should do the trick for xts objects but for some reason I can't get it to work with my object.It seems like this is a common problem for people working with financial time series data as data quality is generally an issue.As such there must be a way to insert an element into an xts object. – codingknob May 23 '11 at 21:00
  • Any thoughts on this from anyone who does work with xts objects? – codingknob May 25 '11 at 18:18

1 Answers1

2

na.locf operates on the data, not the index. If you want to add a row of NA to the data, you would need to make a suitable xts object to rbind to s:

miss <- xts(matrix(1*NA,1,NCOL(s)), first(index(s))-60)
s <- rbind(miss,s)
s <- na.locf(s, fromLast=TRUE)
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • 1
    This is exactly what I need. Thank you. I also just figured that the following also works: "s <- merge(s, timeBasedSeq(paste(start(s), end(s), "M", sep="/")))" following by "s <- na.locf(s)" – codingknob May 30 '11 at 17:06