0

I have a data.frame of multiple irregular time series (data.frame) which looks like this

station   Time     WaterTemp
1       01-01-1974  5.0000000
1       01-02-1974  5.0000000
1       01-03-1974  8.6000004
1       01-05-1974  8.1333332
1       01-07-1974  12.7999999
2       01-01-1974  5.0000000
2       01-02-1974  5.0000000
2       01-04-1974  8.6000004
2       01-06-1974  8.1333332
2       01-08-1974  12.7999999

I want to convert this into regular time series (ts) object which should look like this

Time        Staion1     Station2
 Jan1974    5.0000000  5.0000000
 Feb1974    5.0000000  5.0000000
 Mar1974    8.6000004  NA
 Apr1974    NA         8.6000004
 May1974    8.1333332  NA
 June1974   NA         8.1333332
 July1974   12.7999999 NA
  Aug1974  NA         12.7999999
  Sep1974  NA         NA
  Oct1974  7.9         NA
  Nov1974  NA         NA
  Dec1974  NA         7.4

How do I do that? Although there are lots of solutions for a single time series, but I haven't come across one dealing with multiple time series.

Thanks,

Arora
  • 23
  • 4

1 Answers1

1

If DF is your data frame then try this. Converting to ts in the last line makes it regular and then we convert back to zoo:

library(zoo)
z <- read.zoo(DF, split = 1, index = 2, format = "%d-%m-%Y")
z.ym <- aggregate(z, as.yearmon, identity) # convert to yearmon
zm <- aggregate(as.zoo(as.ts(z.ym)), as.yearmon, identity)

An alternative to the last line would be these two lines:

g <- zoo(, seq(start(z.ym), end(z.ym), deltat(z.ym))) # grid
zm <- merge(z.ym, g)

In either case, at this point coredata(zm) is the data part and time(zm) is the index although you might want to keep it as a zoo object so that you can use its other time series facilities and the many other packages which accept time series of that form.

Note: Here is a complete self-contained reproducible example:

DF <- structure(list(station = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L), Time = structure(c(1L, 2L, 3L, 5L, 7L, 1L, 2L, 4L, 6L, 8L
), .Label = c("01-01-1974", "01-02-1974", "01-03-1974", "01-04-1974", 
"01-05-1974", "01-06-1974", "01-07-1974", "01-08-1974"), class = "factor"), 
    WaterTemp = c(5, 5, 8.6000004, 8.1333332, 12.7999999, 5, 
    5, 8.6000004, 8.1333332, 12.7999999)), .Names = c("station", 
"Time", "WaterTemp"), class = "data.frame", row.names = c(NA, 
-10L))

library(zoo)
z <- read.zoo(DF, split = 1, index = 2, format = "%d-%m-%Y")
z.ym <- aggregate(z, as.yearmon, identity) # convert to yearmon
zm <- aggregate(as.zoo(as.ts(z.ym)), as.yearmon, identity)

giving:

> zm
                 1         2
Jan 1974  5.000000  5.000000
Feb 1974  5.000000  5.000000
Mar 1974  8.600000        NA
Apr 1974        NA  8.600000
May 1974  8.133333        NA
Jun 1974        NA  8.133333
Jul 1974 12.800000        NA
Aug 1974        NA 12.800000

Updated Some corrections and improvements.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Oh! Great! but what if I want to convert the date format into yearmon and also fill in the missing months in the year? – Arora Jul 11 '14 at 14:37
  • just tried using your code. after running the last step gives the following error: Error in zoo(coredata(rval), indexes) : “x” : attempt to define invalid zoo object – Arora Jul 11 '14 at 14:43
  • I have added a self contained reproducible example to show it works. There may be some aspect of your problem that is not as described in the question. – G. Grothendieck Jul 11 '14 at 14:47
  • I guess..because when I carry out the code with your reproducible examples it works great but when use my data it gives the same error. For some reason, the 'z.ym' with my data is not a zoo series but a list – Arora Jul 11 '14 at 14:54
  • Can you find a small subset of rows of your data frame that causes the same problem? If the rows are 10:20, say, then post the result of running: `dput(DF[10:20, ])` – G. Grothendieck Jul 11 '14 at 14:56
  • The series of stations 1 and 2 are of unequal lengths (1 is from 1974-2010 and 2 is from 1991-2010), so there were a lot of NAs for 2 in the beginning.When I truncated the station1 series to 1991-2010 then the code worked. Why is that? – Arora Jul 11 '14 at 15:05
  • Will need a reproducible example. For me it works with `DF[-(6:8), ]` which seems to be the described situation. – G. Grothendieck Jul 11 '14 at 15:11
  • Works now :). One more thing, what if there are more stations in the series, how does this work then? Thanks a ton! – Arora Jul 11 '14 at 15:31