1

I have an xts object that covers 169 days of high frequency 5 minute regular observations, but on some of the days there are missing observations, i.e less than 288 data points. How do I remove these so to have only days with full data points?

find days in data

ddx = endpoints(dxts, on="days");
days = format(index(dxts)[ddx], "%Y-%m-%d");


for (day in days) {
  x = dxts[day];
  cat('', day, "has", length(x), "records...\n");
}

I tried

RTAQ::exchangeHoursOnly(dxts, daybegin = "00:00:00", dayend = "23:55:00") 

but this still returned the full set

Thanks

Community
  • 1
  • 1
number8
  • 161
  • 8

1 Answers1

2

Split by days. Count the number of rows of each day, and only keep the ones that have more than 288 rows.

dxts <- .xts(rnorm(1000), 1:1000*5*60)
daylist <- lapply(split(dxts, "days"), function(x) {
    if(NROW(x) >= 288) x
})
do.call(rbind, daylist)

The above splits dxts by "days". Then, if the number of rows is greater than 288, it returns all the data for that day, otherwise, it returns NULL. So, daylist will be a list. It will have elements that are either an xts object, or NULL. The do.call part will call rbind on the list. It's like calling rbind(daylist[[1]], daylist[[2]], ..., daylist[[n]]) The NULLs won't be aggregated, so you'll be left with a single xts object that omits days with less than 288 rows.

GSee
  • 48,880
  • 13
  • 125
  • 145
  • Hi thanks for that. unfortunately the code wont run for me. I get the error Hi thanks for that. unfortunately the code wont run for me. I get the error:@GSee please see below thanks – number8 Jun 15 '12 at 14:37
  • Yeah a good few times now. I have commented further below as I don't know how to write code in this part #new – number8 Jun 15 '12 at 14:49
  • I left off a closing parenthesis. Also, I think it needs `>= 288` instead of `> 288` since there are only 288 5 minute periods in 24 hours. Edited. Does it work now? – GSee Jun 15 '12 at 14:50
  • when I run the daylist argument unexpected symbol in: "} source" > source('~/.active-rstudio-document', echo=TRUE) Error in source("~/.active-rstudio-document", echo = TRUE) : ~/.active-rstudio-document:62:1: unexpected symbol 61: } 62: do.call ^ Does it make any differece that the 'x' is used in two arguments as the 'x' in the first code only alludes to particular one day in my data set. And also does it matter that each day has exactly 288 data points, so should the argument be if(NROW(x) = 288) – number8 Jun 15 '12 at 14:52
  • I added data so that the code is reproducible. I do not get an error. If you get an error with your data, then please provide more info about your data (dput, or str) – GSee Jun 15 '12 at 14:55
  • getdat("euru") -> dollar ##csv file;; zz1 <- read.zoo(dollar, sep = "",format="%d/%m/%Y %H:%M", tz="", FUN=NULL, regular=TRUE, header=TRUE, index.column=1, colClasses=c("character", "numeric")) ;; dxts <- as.xts(zz1) – number8 Jun 15 '12 at 14:58
  • Sorry: str(dxts) An ‘xts’ object from 2011-11-13 18:00:00 to 2012-05-27 23:55:00 containing: Data: num [1:41502, 1] 1.38 1.38 1.38 1.38 1.38 ... Indexed by objects of class: [POSIXct,POSIXt] TZ: xts Attributes: List of 2 $ tclass: chr [1:2] "POSIXct" "POSIXt" $ tzone : chr "" – number8 Jun 15 '12 at 15:05
  • Great! Please check the check mark to accept this answer if it answered your question. Thank you. – GSee Jun 15 '12 at 15:12
  • hey really sorry to bother you again but do you mind taken a quick look at the unanswered question on my profile? You seem to be the person with the knowledge on here and I'd be ever so grateful if you could have a quick look. I really think it is just a minor adjustment that's needed, but I'm not sure what because I have nothing relative to go on? thanks. I think I have everything as to my problem perfectly explained but no one seems able to answer me – number8 Jun 18 '12 at 02:09