0

Pretty new to R, I appreciate any help you can give me on this issue.

I have two 15-min time series datasets that I would like to merge. I would like to then plot a combination of variables from this merged dataset on a common x-axis. Here’s the catch, one of the datasets is missing values/dates. Let’s call the datasets A and B.

A has a date-time column formatted as posixlt and 14 numeric variables (related to water quality). Readings are at 15 minute intervals. However, the time series is incomplete, some readings, due to equipment error are missing. B also has a 15-minute interval date-time column formatted as posixlt and 1 numeric variable. However this series is continuous i.e. no missing data.

Here is a shortened example but the data covers about 6 months, with this problem repeated randomly throughout (notice data A is missing the 2:45 reading):

Temp Turb            DATETIME                 Q            DATETIME       
1   21    5 2017-03-19 02:00:00             1 3 2017-03-19 02:00:00 
2   15    5 2017-03-19 02:15:00             2 3 2017-03-19 02:15:00
3   18    5 2017-03-19 02:30:00             3 3 2017-03-19 02:30:00
4   17   17 2017-03-19 03:00:00             4 5 2017-03-19 02:45:00
5   18    5 2017-03-19 03:15:00             5 7 2017-03-19 03:00:00
6   17    5 2017-03-19 03:30:00             6 6 2017-03-19 03:15:00
7   17    6 2017-03-19 03:45:00             7 4 2017-03-19 03:30:00    
                                            8 3 2017-03-19 03:45:00

Here is the code for the shortened example.

A<- data.frame("DATE" =c("3/19/17","3/19/17","3/19/17","3/19/17",
                        "3/19/17","3/19/17","3/19/17"), "TIME"=c("02:00:00","02:15:00",
                         "02:30:00","03:00:00","03:15:00","03:30:00","03:45:00"),
                        "Temp" = c(21,15,18,17,18,17,17), "Turb" = c(5,5,5,17,5,5,6)) 
                          A$DATETIME <-paste(A$DATE,A$TIME)
                         A$DATETIME <- strptime(A$DATETIME, "%m/%d/%y %H:%M:%S")
                         A<-subset( A, select = -c(DATE,TIME) )
B <- data.frame("DATE" =c("3/19/17","3/19/17","3/19/17","3/19/17",
                        "3/19/17","3/19/17","3/19/17","3/19/17"), "TIME"=c("02:00:00","02:15:00",
                         "02:30:00","02:45:00","03:00:00","03:15:00","03:30:00","03:45:00"),
                         "Q" = c(3,3,3,5,7,6,4,3)) 
                          B$DATETIME <-paste(B$DATE,B$TIME)
                          B$DATETIME <- strptime(B$DATETIME, "%m/%d/%y %H:%M:%S")
                         B<-subset( B, select = -c(DATE,TIME) )

I was hoping to use padr to fill in missing data in A with blanks so that the numeric variables from A and B could be interchangeably plotted with the time of A and B. Unfortunately 15 minutes is a nonstandard time interval for using padr. I have looked at stack overflow for examples and the closest I could find was here but couldn’t make it work. It fleshed my dataset out to 1 minute data intervals but NA’s replaced all my numeric variables even at the 15-minute mark. Errors included statements such as DateTime variable was unsorted.

Any other recommendations or more specific guidance on the linked example? Thanks!!!

Community
  • 1
  • 1
SMATT
  • 1
  • 1

2 Answers2

1

You can solve this with xts, but keep in mind that xts is like a matrix: it can only store one type of data. Also, you should use POSIXct instead of POSIXlt, unless you need to access the specific elements contained in POSIXlt objects.

require(xts)
a <- xts(A[,c("Temp","Turb")], as.POSIXct(A$DATETIME))
b <- xts(B["Q"], as.POSIXct(B$DATETIME))
(x <- merge(a, b))
#                     Temp Turb Q
# 2017-03-19 02:00:00   21    5 3
# 2017-03-19 02:15:00   15    5 3
# 2017-03-19 02:30:00   18    5 3
# 2017-03-19 02:45:00   NA   NA 5
# 2017-03-19 03:00:00   17   17 7
# 2017-03-19 03:15:00   18    5 6
# 2017-03-19 03:30:00   17    5 4
# 2017-03-19 03:45:00   17    6 3

If you want, you can convert the result back to a data.frame:

d <- data.frame(DATETIME=index(x), x)
# remove the rownames, if you want
rownames(d) <- NULL
print(d)
#              DATETIME Temp Turb Q
# 1 2017-03-19 02:00:00   21    5 3
# 2 2017-03-19 02:15:00   15    5 3
# 3 2017-03-19 02:30:00   18    5 3
# 4 2017-03-19 02:45:00   NA   NA 5
# 5 2017-03-19 03:00:00   17   17 7
# 6 2017-03-19 03:15:00   18    5 6
# 7 2017-03-19 03:30:00   17    5 4
# 8 2017-03-19 03:45:00   17    6 3
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
0

Does a full_join give you the desired output? In the code below, we need to convert DATETIME to POSIXct format, but you can convert back after, if needed.

library(dplyr)

A$DATETIME = as.POSIXct(A$DATETIME)
B$DATETIME = as.POSIXct(B$DATETIME)

AB = full_join(A, B, by="DATETIME")  # a dplyr function
  Temp Turb            DATETIME Q
1   21    5 2017-03-19 02:00:00 3
2   15    5 2017-03-19 02:15:00 3
3   18    5 2017-03-19 02:30:00 3
4   17   17 2017-03-19 03:00:00 7
5   18    5 2017-03-19 03:15:00 6
6   17    5 2017-03-19 03:30:00 4
7   17    6 2017-03-19 03:45:00 3
8   NA   NA 2017-03-19 02:45:00 5

With base R, the equivalent would be:

AB = merge(A,B, by="DATETIME", all=TRUE)
eipi10
  • 91,525
  • 24
  • 209
  • 285