0

I have the following data frame:

read.csv(file="CNY % returns.csv",head=TRUE,sep=",")
    DATE LOG...RETURNS
1   03/09/13    -6.9106715
2   04/09/13    -6.9106715
3   05/09/13    -4.5839582
4   06/09/13     1.7554592
5   07/09/13    -0.8808549
6   08/09/13     4.1842420 


DATE: obviosuly date; format dd/mm/yyyy.
LOG RETURNS: compounded returns from a bitcoin CNY exchange. 

I wish to use the auto.arima function as a start point to select a suitable model.

I have already tried:

cnyX <- read.zoo(text="        DATE LOG...RETURNS
1   03/09/13    -6.9106715
2   04/09/13    -6.9106715
3   05/09/13    -4.5839582
4   06/09/13     1.7554592
5   07/09/13    -0.8808549
6   08/09/13     4.1842420")


index(cnyX) <- as.Date(as.character(index(cnyX)),format="%D%m%y") 

this produces:

<NA>        <NA>        <NA>        <NA>        <NA>        <NA>
0.2144527  -9.2553228  -0.8519708  -4.2074340  14.0817672   1.2212485 ....                

I realise the as.character separator is incorrect but am unsure how this should be fixed or corrected. I have read about creating XTS and TS objects but have not been able to make these work either. I have also referred to: Convert data frame with date column to timeseries but found this unsuitable.

How should I convert my data frame to a suitable format for auto.arima? I may have duplicate values present.

Community
  • 1
  • 1

2 Answers2

1

The problem stems from the incorrect format argument you passed to as.Date. In fact, if you ever try to convert something from character to Date and you get a vector of all NAs, you almost certainly did not specify the format correctly.

Here's a comparable data set:

Df <- data.frame(
  Date = format(Sys.Date() - (729:0), "%d/%m/%y"),
  LogReturns = log(rgamma(730, .25)),
  stringsAsFactors = FALSE
)

Using the correct format,

ln_ret <- zoo::zoo(Df[,2], as.Date(Df[,1], format = "%d/%m/%y"))

ln_ret[1:4]
#2014-01-05 2014-01-06 2014-01-07 2014-01-08 
# -2.268443  -3.562711  -4.546391  -0.707788 

This will work with auto.arima:

forecast::auto.arima(ln_ret)
#Series: ln_ret 
#ARIMA(0,0,0) with non-zero mean 
#
#Coefficients:
#    intercept
#      -4.0742
#s.e.   0.1454
#
#sigma^2 estimated as 15.43:  log likelihood=-2034.46
#AIC=4072.93   AICc=4072.94   BIC=4082.11 
nrussell
  • 18,382
  • 4
  • 47
  • 60
  • Thank you. Forgive my lack of programming expertise but how would I modify: `Df <- data.frame( Date = format(Sys.Date() - (729:0), "%d/%m/%y"), LogReturns = log(rgamma(730, .25)), stringsAsFactors = FALSE ' )` to ensure the start date is 03/09/15 (uk format), i presume `(Sys.Date() - (729:0),` is key? – – Jack Thompson Jan 04 '16 at 16:15
  • The `Df` object I provided was just arbitrary sample data; but the methods I used to transform the character column into `zoo` object should work equally as well on your data. If, in your actual data set, you need to truncate the start time to September 3rd, 2015, you can do something like `ln_ret[index(ln_ret) >= as.Date("2015-09-03")]`. – nrussell Jan 04 '16 at 16:31
1

You don't need to worry about the correct date format if you just want to fit an ARIMA model to your log-return data. That is, you know when the ts begins and ends, and it's trivial to keep track of the dates for any forecasts, if those are ultimately desired.

This would work, too.

tt <- read.csv(file="CNY % returns.csv",head=TRUE,sep=",")
# assuming default options for orders p, d, q, etc
forecast::auto.arima(x=tt[,2]) 
Mark S
  • 603
  • 4
  • 9