I wish to merge two datasets using ‘date time’ columns that are present in both (POSIXct format: dd/mm/yyyy hh:mm). Below is example data from the two datasets:
# Dataset 1 (dts1)
datetime count period
1 30/03/2011 02:32 27 561
2 30/03/2011 02:42 3 600
3 30/03/2011 02:52 0 574
4 30/03/2011 03:02 1 550
5 30/03/2011 03:12 15 600
6 30/03/2011 03:22 0 597
# Dateset 2 (dts2)
datetime dist car satd alt
1 30/03/2011 01:59 23.9 1 3 1.76
2 30/03/2011 02:58 14.7 1 7 6.36
3 30/03/2011 03:55 10.4 2 9 -0.34
4 30/03/2011 04:53 35.4 1 3 3.55
5 30/03/2011 05:52 56.1 1 7 -0.91
6 30/03/2011 06:48 12.3 1 4 6.58
7 30/03/2011 07:48 10.7 1 5 4.18
If it was a simple case of merging matching rows from the two frames then basic merge(data1, data2, by="datetime")
or rbind()
function could be used.
However, my problem is more complicated as the time intervals in the two datasets are not equal. Dataset 1
contains data at precise 10-minute intervals (each row incorporates information on the 10-minute block that ends at the specified date/time), while dataset 2
contains data at approximate 1-hour intervals (each row incorporates information from the 1-hour block that ends at the specified date/time).
To make things further harder, there is a time mismatch between the start times of the rows in the two datasets (i.e. dts1
: 01/03/2013 10:00:00, dts2
: 01/03/2012 09:58:12). dts2
intervals also vary throughout the dataset (± few minutes either side of 1 hour). I want to link every 10-minute data row in dataset 1 with the hour block that it fits within in dataset 2 (along with all the associated column values from dts2). There will be some rows that could fit into 2 different hour blocks (i.e. 30/03/2011 03:02), but I only need these rows to be linked to one of the hour blocks.
I would like to end up with something like this:
datetime_dts1 count period datetime2_dts2 dist car satd alt
1 30/03/2011 02:32 27 561 30/03/2011 02:58 14.7 1 7 6.36
2 30/03/2011 02:42 3 600 30/03/2011 02:58 14.7 1 7 6.36
3 30/03/2011 02:52 0 574 30/03/2011 02:58 14.7 1 7 6.36
4 30/03/2011 03:02 1 550 30/03/2011 02:58 14.7 1 7 6.36
5 30/03/2011 03:12 15 600 30/03/2011 03:55 10.4 2 9 -0.34
6 30/03/2011 03:22 0 597 30/03/2011 03:55 10.4 2 9 -0.34
I have searched for an answer to this problem but have not been able to solve it and my R abilities are not advanced. If anyone could give me a direction or provide a solution, I would be extremely grateful.