2

I am having some trouble subsetting a time series object in r.

1.I imported a csv file into R as follows (after removing the date column in excel)

sz.bm.df <- read.csv('size_book_25.csv',header=T)

2.The csv file has 1038 rows and 25 columns, missing values are designated by -99.99

3.I then created a time series object with a custom date range as follows

szbm.ts.data <- ts(data=sz.bm.df,start=c(1926,7),frequency=12)

4.Now i would like to deal with the missing values problem (i am having problems with this). I would like to create a subset out of the time series object that starts from the last row on which we find -99.99 until the end of the original object. I tried the following to extract the dates on which missing values are to be found:

time(szbm.ts.data[which(szbm.ts.data==-99.99)])

however,instead of giving me a set of dates this gives me:

 [1]  1  2  3  4  5  6  7  8  9 10 11 12

attr(,"tsp") [1] 1 12 1

what am i doing wrong here?

thank you for any help

HalfAFoot
  • 121
  • 1
  • 3

1 Answers1

4

Here are some alternatives:

1) use the window command (see ?window)

tt <- ts(c(1:5, -99, 6:9), start = 2000, freq = 12)

t.start <- time(tt)[tail(which(tt == -99), 1)+1]
window(tt, t.start)

2) Represent your time series as a zoo or xts object using the respective packages:

library(zoo)

z <- as.zoo(tt)

ix <- tail(which(z == -99), 1) + 1
zz <- z[ix:length(z)]

We either just continue to use zz or convert it back to ts class: as.ts(zz) .

3) The na.contiguous command (see ?na.contiguous) will find the longest stretch of non-NAs.

tt[tt == -99] <- NA
na.contiguous(tt)

which may or may not give you what you want depending on where the NA's are. In the case of the example here it seems not to be what you want but in your real example it may be ok if it consists of just a few NAs at the beginning.

For 2 dimensional data we can use this:

m <- matrix(1:24, 6)
m[2,2] <- m[1,4] <- -99
t2 <- ts(m, start = 2000, freq = 12)

# 1
has.na <- apply(t2 == -99, 1, any)
t.start <- time(tt)[tail(which(has.na), 1)+1]
window(t2, t.start)

# 2
library(zoo)
z <- as.zoo(t2)
has.na <- apply(z == -99, 1, any)
ix <- tail(which(has.na), 1) + 1
z[ix:nrow(z)]

# 3
t2[] <- apply(t2, 2, function(x) replace(x, x == -99, NA))
na.contiguous(t2)

Note: In future, please state questions in reproducible form as discussed here: How to make a great R reproducible example?

UPDATE: Have also added examples of performing these operations on multivariate time series.

UPDATE 2: corrected spelling of na.contiguous

Community
  • 1
  • 1
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • i tried your first solution but i get **NA** as a result. i tried this `idx <- time(szbm.ts.data)[tail(which(szbm.ts.data==-99.99),1)+1]`. when i only tried the following `tail(which(szbm.ts.data==-99.99),1)+1`,i get a very large number **24973**....help!! – HalfAFoot May 30 '13 at 10:49
  • @Half, Have added examples of this with multivariate series. Please follow instructions in **note** before asking further questions. – G. Grothendieck May 30 '13 at 11:39