5

I'm usin R language and working with time series daily stock index from differents countries. In order to make comparisons between of differents indexes,(like correletaion, causality etc..) I need that all the series have the same number of lines, but because diferents holidays in diferents countries, the number of lines in each series change.

I'm working with extracted files from yahoo finance, with format .csv, like...

> head(sp)
>           Date    Open    High     Low   Close     Volume Adj.Close
>1288 2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000   1132.99
>1287 2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000   1136.52
>1286 2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14

I need... for example, suppose that day 2010-01-07 is a holiday, in this case, the next line (line 1285) in the file is the day 2010-01-08:

> head(sp)
>           Date    Open    High     Low   Close     Volume Adj.Close
>1288 2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000   1132.99
>1287 2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000   1136.52
>1286 2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
>1285 2010-01-08 1140.52 1145.39 1136.22 1144.98 4389590000   1144.98

In need fill the gap in 2010-01-07 with the previus day data, like :

> head(sp)
>           Date    Open    High     Low   Close     Volume Adj.Close
>1288 2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000   1132.99
>1287 2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000   1136.52
>1286 2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
>1285 2010-01-07 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
>1284 2010-01-08 1140.52 1145.39 1136.22 1144.98 4389590000   1144.98

How I can do this ???

My code is (look all the library that I tried using for solve my problem kkk)

>library(PerformanceAnalytics)
>library(tseries)
>library(urca)
>library(zoo)
>library(lmtest)
>library(timeDate)
>library(timeSeries)

>setwd("C:/Users/Fatima/Documents/R")

>sp = read.csv("SP500.csv", header = TRUE, stringsAsFactors = FALSE)
>sp$Date = as.Date(sp$Date)
>sp = sp[order(sp$Date), ]

Sorry about my bad english

2 Answers2

3

Package xts is useful here:

DF <- read.table(text = "           Date    Open    High     Low   Close     Volume Adj.Close
1288 2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000   1132.99
1287 2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000   1136.52
1286 2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
1285 2010-01-08 1140.52 1145.39 1136.22 1144.98 4389590000   1144.98", header = TRUE)

DF$Date <- as.Date(DF$Date)

library(xts)
X <- as.xts(DF[,-1], order.by = DF$Date)
na.locf(merge(X, seq(min(DF$Date), max(DF$Date), by = 1)))
#              Open    High     Low   Close     Volume Adj.Close
#2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000   1132.99
#2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000   1136.52
#2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
#2010-01-07 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
#2010-01-08 1140.52 1145.39 1136.22 1144.98 4389590000   1144.98

Edit:

In response to your comment: You can exclude weekends like this:

dates <- seq(min(DF$Date), max(DF$Date), by = 1)
#you might have to adjust the following to the translations in your locale
dates <- dates[!(weekdays(dates) %in% c("Saturday", "Sunday"))]
na.locf(merge(X, dates))
Roland
  • 127,288
  • 10
  • 191
  • 288
  • 1
    Will this solution work also in the case of back-to-back missing days? E.g. if 2 days be missing consecutively will this also work? – Tim Biegeleisen Mar 19 '15 at 13:36
  • @TimBiegeleisen Try and see. (Yes, it would.) – Roland Mar 19 '15 at 13:37
  • hanks a lot, =) but like G. Grothendieck do... this code fill gaps in the weekends, and I need fill the days whithout data between Monday to Friday. Days which holidays in stock exchange. – FlávioCorinthians Mar 19 '15 at 15:20
2

Read it in using read.zoo, add the missing days by merging a zero width zoo series with all the dates. Finally use na.locf to fill in the NA values generated by the merge.

Lines <- "Date,Open,High,Low,Close,Volume,Adj.Close
2010-01-04,1116.56,1133.87,1116.56,1132.99,3991400000,1132.99
2010-01-05,1132.66,1136.63,1129.66,1136.52,2491020000,1136.52
2010-01-06,1135.71,1139.19,1133.95,1137.14,4972660000,1137.14
2010-01-11,1140.52,1145.39,1136.22,1144.98,4389590000,1144.98"

library(zoo)
z <- read.zoo(text = Lines, header = TRUE, sep = ",")
zout <- na.locf( merge(z, zoo(, seq(start(z), end(z), by = "day"))) )

giving:

> zout
              Open    High     Low   Close     Volume Adj.Close
2010-01-04 1116.56 1133.87 1116.56 1132.99 3991400000   1132.99
2010-01-05 1132.66 1136.63 1129.66 1136.52 2491020000   1136.52
2010-01-06 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
2010-01-07 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
2010-01-08 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
2010-01-09 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
2010-01-10 1135.71 1139.19 1133.95 1137.14 4972660000   1137.14
2010-01-11 1140.52 1145.39 1136.22 1144.98 4389590000   1144.98

An alternative to the na.locf line is to use na.approx with method = "constant" instead:

na.approx(z, xout = seq(start(z), end(z), by = "day"), method = "constant")

giving the same answer.

Added To NA out weekends:

library(chron)
zout[is.weekend(time(zout)), ] <- NA

or to return only weekdays:

library(chron)
zout[!is.weekend(time(zout))]
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thanks a lot, =) but I still have a problema, because this method fill gaps because the Weekend and I need fill just the holidays.. in other words, days whithout data between Monday to Friday. – FlávioCorinthians Mar 19 '15 at 15:17