2

I have a xts time series object made up of minute by minute intraday trading data for 2015. I would like to add a dummy variable denoting 1 as an event day or 0 as a nonevent day.

Since the dummy variable is not inherently a time series, is it possible for me to add this to my trading data?

How should I construct the dummy column?

How can it be added to the existing xts?

New to R, so please be as specific as possible in your answer. Thank you!

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
shoestringfries
  • 279
  • 4
  • 18
  • 2
    If you want people to be as specific as possible in their answer, it would help if you would be as specific as possible in your question (i.e. provide a [minimal reproducible example](http://stackoverflow.com/q/5963269/271616) showing the input and expected output). – Joshua Ulrich Oct 17 '16 at 00:28

1 Answers1

3

xts is based on zoo, and the zoo FAQ (question 4) has this line about differing data types:

A "zoo" object may be (1) a numeric vector, (2) a numeric matrix or (3) a factor but may not contain both a numeric vector and factor.

So as long as your 0s and 1s are numeric, not factor, you should be ok. It's not a hugely efficient storage medium, but storage efficiency might not be your bottleneck.

An example:

timestamp <- seq.POSIXt(from=as.POSIXct("2016-10-12 09:00"), 
                    to=as.POSIXct("2016-10-13 09:00"), 
                    by="min")
dat <- rnorm(length(timestamp))
foo <- xts(dat,order.by=timestamp)

Now that indicator variable:

#make this example reproducible:
set.seed(123)

dummy2 <- sample(c("event","non-event"), size=length(timestamp),
    replace=TRUE)
foo2 <- xts(dummy2, order.by=timestamp)
merged <- cbind(foo, foo2)

And that warns you:

In merge.xts(..., all = all, fill = fill, suffixes = suffixes) :
  NAs introduced by coercion

Indeed:

summary(merged)
     Index                          ..1                ..2      
 Min.   :2016-10-12 09:00:00   Min.   :-3.38110   Min.   : NA   
 1st Qu.:2016-10-12 15:00:00   1st Qu.:-0.64010   1st Qu.: NA   
 Median :2016-10-12 21:00:00   Median : 0.04047   Median : NA   
 Mean   :2016-10-12 21:00:00   Mean   : 0.03025   Mean   :NaN   
 3rd Qu.:2016-10-13 03:00:00   3rd Qu.: 0.67461   3rd Qu.: NA   
 Max.   :2016-10-13 09:00:00   Max.   : 3.25034   Max.   : NA   
                                                  NA's   :1441  

But if it's a numeric:

dummy3 <- sample(0:1, size=length(timestamp), replace=TRUE)
foo3 <- xts(dummy3, order.by=timestamp)
merged <- cbind(foo, foo3)

returns silently (and no news is good news). Let's have a look:

summary(merged)

         Index                          ..1                ..2        
 Min.   :2016-10-12 09:00:00   Min.   :-3.38110   Min.   :0.0000  
 1st Qu.:2016-10-12 15:00:00   1st Qu.:-0.64010   1st Qu.:0.0000  
 Median :2016-10-12 21:00:00   Median : 0.04047   Median :0.0000  
 Mean   :2016-10-12 21:00:00   Mean   : 0.03025   Mean   :0.4983  
 3rd Qu.:2016-10-13 03:00:00   3rd Qu.: 0.67461   3rd Qu.:1.0000  
 Max.   :2016-10-13 09:00:00   Max.   : 3.25034   Max.   :1.0000  

Since column 2 is numeric, we don't compare using equality; if that isn't intuitive to you, check out Circle One of the R Inferno (caution: PDF).

summary(merged[merged[,2] > 0.5 ,1] )
summary(merged[merged[,2] < 0.5 ,1] )

There's probably a more elegant way of doing that, but it'll get you started.

If you plan on working with xts more than trivially, I recommend the advice from the authors of xts:

At the core of an xts object is a zoo object from the package of the same name. ... Most of the details surrounding zoo objects apply equally to xts. As it would be redundant to simply retell the excellent introductory zoo vignette, the reader is advised to read, absorb, and re-read that documentation to best understand the power of this class.

Jason
  • 2,507
  • 20
  • 25