3

REWRITTEN QUESTION HERE:

I've made some progress but am getting odd behaviour from R...

Here's the xts I'm starting with

<no title>  Value   Value2  Value3
2002-08-21  21      2       27
2003-09-10  22      42      87
2004-02-12  23      62      67
2005-04-13  24      13      73
2006-05-13  25      4       28
2007-08-14  20      68      25
2008-03-06  19      82      22

What I want to produce:

 <no title> Value   Value2  Value3  ThisDate    NextDate
    2002-08-21  21      2       27      2002-08-21  2003-09-10
    2003-09-10  22      42      87      2003-09-10  2004-02-12
    2004-02-12  23      62      67      2004-02-12  2005-04-13
    2005-04-13  24      13      73      2005-04-13  2006-05-13
    2006-05-13  25      4       28      2006-05-13  2007-08-14
    2007-08-14  20      68      25      2007-08-14  2008-03-06
    2008-03-06  19      82      22      2008-03-06  NA

I've written a function like this:

StackUpAdjacentDates <- function(sourceTimeSeries)
{
    returnValue <- sourceTimeSeries

    thisDate <- as.character(index(sourceTimeSeries))
    nextDate <- c(as.character(thisDate[2:length(thisDate)]),NA)

    thisDate <- as.Date(strptime(thisDate, "%Y-%m-%d"))
    nextDate <- as.Date(strptime(nextDate, "%Y-%m-%d"))

    # set up thisDate in a new column
    if ("thisDate" %in% colnames(returnValue) )
    {
        returnValue<-returnValue[,-which(colnames(returnValue)=="thisDate")]
    }
    returnValue <- cbind(returnValue, thisDate)
    colnames(returnValue)[ncol(returnValue)] <- "thisDate"
    returnValue$thisDate <- thisDate

    # add nextDate in a new column
    if ("nextDate" %in% colnames(returnValue) )
    {
        returnValue<-returnValue[,-which(colnames(returnValue)=="nextDate")]
    }
    returnValue <- cbind(returnValue,nextDate)
    colnames(returnValue)[ncol(returnValue)] <- "nextDate"
    #returnValue$nextDate <- nextDate

}

This successfully adds thisDate (running the code step-wise at the command-line). But the bit that adds nextDate seems to over-write it! I also seem to get an unexpected row of NAs. Still working on this...

<no title>  Value   Value2  Value3  nextDate
2002-08-21  21      78      76      12305
2003-09-10  22      70      23      12460
2004-02-12  23      84      22      12886
2005-04-13  24      97      28      13281
2006-05-13  25      26      97      13739
2007-08-14  20      59      22      13944
2008-03-06  19      64      98      NA
<NA>        NA      NA      NA      NA

I've put "no title" in the first column to indicate that it's the xts date-index rather than actually a part of the vector/matrix.

The bit about removing the extra row is because I've not yet solved the over-write problem and was experimenting. It doesn't need to be there in the final answer but is where I am up to at present.

And lastly, when I interrogate this result and try to convert nextDate to a date I get....

> as.Date(returnValue$nextDate)
Error in as.Date.default(returnValue$nextDate) : 
  do not know how to convert 'returnValue$nextDate' to class "Date"

So I'm in a bit of a muddle...

ORIGINAL QUESTION BELOW:

I have a time-series in R (which I am learning fast, but clearly not fast enough!) like this

             Value
2002-08-21    21
2003-09-10    22
2004-02-12    23
2005-04-13    24
2006-05-13    25
2007-08-14    20
2008-03-06    19

I want to create a derivative of it with the date-index in the NEXT row in a new column in each row:

              Value    NextDate
2002-08-21    21       2003-09-10
2003-09-10    22       2004-02-12
2004-02-12    23       2005-04-13
2005-04-13    24       2006-05-13
2006-05-13    25       2007-08-14
2007-08-14    20       2008-03-06
2008-03-06    19       [...]

It's pretty easy to do for Value (using Lag) but not for the date-index iteself.

I can probably work out how to do it using various lookups and the like, but it is messy. You have to match on some other field, or fiddle around with row-numbers which doesn't feel very "true to R".

Is there a nice, neat, elegant way to do it?

I'm pretty sure I'll go "D'OH!" as soon as someone gives the answer! But so far I haven't found an answer on this site for lagging the date-index.

The reason I want to do this is I then want to use each pair of dates in a row to interrogate another series. So there might be a better way to do this.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
Bit Rocker
  • 777
  • 2
  • 8
  • 13
  • What class is your actual object in R? – A5C1D2H2I1M1N2O1R2T1 Sep 15 '12 at 10:35
  • using xts - sorry should have said! – Bit Rocker Sep 15 '12 at 10:55
  • What exactly do you mean by "interrogate another series"? – Roland Sep 15 '12 at 11:31
  • If it is `xts` then you **cannot do this** as `xts` is essentially limited to a numeric matrix plus index. – Dirk Eddelbuettel Sep 15 '12 at 15:01
  • @Roland I want to use thisDate and nextDate as parameters for computing an average of all the original data. The table in this example is already just an extract of it. To thisDate and nextDate define the start and end of a sample window. – Bit Rocker Sep 15 '12 at 15:28
  • @DirkEddelbuettel OK is there something else I can use? Can I put dates in Julian format (or number of days from 1/Jan/1970) and convert them back to dates later maybe? – Bit Rocker Sep 15 '12 at 15:29
  • 1
    @BitRocker: I would rethink what you are trying to do. My preference would be to use `merge(X, lag(X))` which is cheap and fast with `xts`. If you really next the extra date column (why?), switch to using data.frame and drop xts. Your call. – Dirk Eddelbuettel Sep 15 '12 at 15:32
  • 1
    @BitRocker: As for your sliding average, `zoo` and `xts` *already do that for you*. Read the zoo vignettes for inspiration. – Dirk Eddelbuettel Sep 15 '12 at 15:33

3 Answers3

2

I'm not sure xts is the best thing for what your trying to do, but for what its worth here is how to take your xts object, make a dataframe and create the extra time column you want and then convert it to a time format.

 data(sample_matrix)
 x <- as.xts(sample_matrix)
 head(x)
 df <-as.data.frame(x)
 head(df)
 newdates<-rownames(df)

 df$nextdates<-c(newdates[2:length(newdates)],"NA")
 df$nextdates<-as.POSIXct(strptime(df$nextdates, "%Y-%m-%d"))
 head(df)
user1317221_G
  • 15,087
  • 3
  • 52
  • 78
  • Wow amazing. Now I need to figure out how it works ... thanks! – Bit Rocker Sep 15 '12 at 12:01
  • @user1317221_G sure am still thinking about it - I'm not working with a text table like in your example, so am recrafting to use it with the actual date index of an xts. I access it using index(). This throws an error on the df$dates[df$dates[2]: bit which I replace with just df$dates([2] – Bit Rocker Sep 15 '12 at 14:06
  • what I'm having to do is thisDate <- as.character(index(sourceTimeSeries)) nextDate <- c(as.character(thisDate[2:length(thisDate)]),NA) then stick thisDate and nextDate on the end of the original vector (which has multiple columns - I just filleted out the bare bones for the example above). The trick is that R seems to choke on mixing data of different types. If I add the dates as a string, all pre-existing numbers have quotes added to them. If I add the date as a date, they come out as (I think) days from 1 Jan 1970 or whatever the UNIX start-date is. I'm sure I'll get there... – Bit Rocker Sep 15 '12 at 14:50
  • If you present the output of `dput(yourTS)` I am sure you will get an answer that fits your case. However, I really suggest to present your whole problem and not only the step, where you think you got stuck. There might be a much better way to achieve your goal. – Roland Sep 15 '12 at 14:55
  • OK everyone I'm learning the etiquette here too ... have re-written the original question – Bit Rocker Sep 15 '12 at 15:40
  • @BitRocker Re: "R seems to choke on mixing data of different types". This is because XTS is a matrix underneath, meaning all values have to be the same type. So you can't have some numeric columns and some date columns. I normally use data frames for that, as shown in this answer. Other solutions I've used are a `list` of two xts objects, one for the numerics, one for the dates. Another is to attach the date column in an attribute of your main date column. These alternative solutions are less useful if you want to process your data in rows, however. – Darren Cook Sep 16 '12 at 01:24
  • OK having done some thinking (and some learning) I think the first answer actually nails it. I was still too new to understand the difference between xts and a good old-fashioned dataframe. The DF approach does win. There have been nonetheless some great answers elsewhere. So I feel guilty awarding user1317221_G the points but I think that is fair. Any objections, let me know. I'll award points tomorrow... – Bit Rocker Sep 16 '12 at 11:25
1

I think this is similar to what you actually want to do:

library(xts)
#create example xts
times <- seq(as.Date('2002-08-21'),as.Date('2002-09-06 '),by="day")
myts <- xts(x=1:length(times),order.by=times)

#second xts, with start and end times
times2 <- c("2002-08-21","2002-08-31","2002-09-06")    
myts2 <- myts[times2] 

#get start and end times
ix <- index(myts2)

#get positions in myts
ep <- which(index(myts) %in% ix)-1

#calculate means
period.apply(myts,ep,mean) 

Note: This includes the starting time and excludes the end time, when calculating the period mean.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • Hmm this is interesting. Let me think about it a bit more. – Bit Rocker Sep 16 '12 at 11:28
  • OK this answer has inspired me to create the right solution. A lot of people have helped but this is definitely the one that got me there. I will give it the marks unless anyone objects. – Bit Rocker Sep 16 '12 at 18:06
0

I believe what you are looking for is:

dayDifff <- function(X)
{
    as.numeric(as.Date(index(X))) - c(NA, as.numeric(as.Date(index(X[-nrow(X)]))))
}

Where X is an xts object. I've converted the native POSIXct times into dates, and added an NA to the head and taken off the final date with X[-nrow(X)].

If you have times in seconds etc, you'll need to keep the second precision of POSIXct, but you should be able to get from the date/integer case above to that with a moment's effort.

ricardo
  • 8,195
  • 7
  • 47
  • 69