2

how to add a missing dates and remove repeated dates in hourly time series . Missing date fill corresponding dates with NA for rainfall.

The example time series like

               date  Rainfall(mm)
1970-01-05 00:00:00           1.0 
1970-01-05 01:00:00           1.0
1970-01-05 05:00:00           3.6
1970-01-05 06:00:00           3.6
1970-01-05 07:00:00           2.2
1970-01-05 08:00:00           2.2
1970-01-05 09:00:00           2.2
1970-01-05 10:00:00           2.2
1970-01-05 11:00:00           2.2
1970-01-05 13:00:00           2.2
1970-01-05 13:00:00           2.2
1970-01-05 13:00:00           2.2
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
user1537175
  • 95
  • 1
  • 3
  • 6

2 Answers2

1

You can use a combination of seq.POSIXt to create a data.frame with no missing time steps (object grid.), and then use merge to combine with the observed df in my example.

This should solve your problem

# Create a sample data.frame missing every second observation.
df <- data.frame(date=seq.POSIXt(from=as.POSIXct("1970-01-01 00:00:00"), to=as.POSIXct("1970-01-01 10:00:00"), by="2 hours"), rainfall=rnorm(6))
#Create a seq of times without anything missing
grid. <- data.frame(date=seq.POSIXt(as.POSIXct("1970-01-01 00:00:00"), to=as.POSIXct("1970-01-01 10:00:00"), by="1 hours"))
# Merge them together keeping all the values from grid.
dat. <- merge(grid., df, by="date", all.x=TRUE)

To remove duplicated values you can either look for them and remove them using the duplicated function.

# The ! means the reverse logic. Therefore TRUE becomes FALSE.
dup_index <- !duplicated(dat.[,1])
# Now re-create the dat. object with only non-duplicated rows.
dat. <- dat.[dup_index,]

The other way to do it is to use the aggregate function. This could be useful if you have duplicates which are really two different observations and therefore you want the mean of the two, using;

dat. <- aggregate(dat.[,2], by=list(dat[,1]), FUN=mean)

HTH

Jase_
  • 1,186
  • 9
  • 12
1

FAQ #13 in the the zoo FAQ vignette addresses the part about filling time series. The aggregate argument in read.zoo handles the duplicates. In this case we average them but we could have taken other action such as using FUN = function(x) tail(x, 1). We use chron date/times here to avoid time zone problems (see R News 4/1) but we could have used POSIXct if time zones were relevant -- they seem not since they are not in the input.

Lines <- "date  Rainfall(mm)
1970-01-05 00:00:00           1.0 
1970-01-05 01:00:00           1.0
1970-01-05 05:00:00           3.6
1970-01-05 06:00:00           3.6
1970-01-05 07:00:00           2.2
1970-01-05 08:00:00           2.2
1970-01-05 09:00:00           2.2
1970-01-05 10:00:00           2.2
1970-01-05 11:00:00           2.2
1970-01-05 13:00:00           2.2
1970-01-05 13:00:00           2.2
1970-01-05 13:00:00           2.2"

library(zoo)
library(chron)

asChron <- function(d, t) as.chron(paste(d, t))
z <- read.zoo(text = Lines, skip = 1, index = 1:2, FUN = asChron, agg = mean)
merge(z, zoo(, seq(start(z), end(z), 1/24))) # as in FAQ

If the data comes from a file replace text = Lines with something like file = "myfile.dat" .

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341