I have two sets of data.
Sample of set_A
(total number of rows: 45467):
ID_a a1 a2 a3 time_a
2 35694 5245.2 301.6053 00.00944
3 85694 9278.9 301.6051 23.00972
4 65694 9375.2 301.6049 22.00972
5 85653 4375.5 301.6047 19.00972
6 12694 5236.3 301.6045 22.00972
7 85697 5345.2 301.6043 21.00972
8 85640 5274.1 301.6041 20.01000
9 30694 5279.0 301.6039 20.01000
Sample of set_B
(total number of rows: 4798):
ID_b b1 b2 source time_b
2 34.20 15.114 set1.csv.1 20.35750
7 67.20 16.114 set1.csv.2 21.35778
12 12.20 33.114 set1.csv.3 22.35806
17 73.20 67.114 set2.csv.1 23.35833
23 88.20 42.114 set2.csv.2 19.35861
28 90.20 52.114 set3.csv.1 00.35889
I am interested in a result where to the set_B
, rows from set_A
are matched by the nearest values of time_a
and time_b
(total number of output rows: 4798). In set_A
values of time_a
could be repeated several times (eg. ID_a[8,]
and [ID_a[9,]
) - it doesn't really matter which row will be merged with a row from set_B
(in this case ID_b[1,]
). Example of the expected result:
ID_b b1 b2 source time_b ID_a a1 a2 a3 time_a
2 34.20 15.114 set1.csv.1 20.35750 8 85640 5274.1 301.6041 20.01000
7 67.20 16.114 set1.csv.2 21.35778 7 85697 5345.2 301.6043 21.00972
12 12.20 33.114 set1.csv.3 22.35806 4 65694 9375.2 301.6049 22.00972
17 73.20 67.114 set2.csv.1 23.35833 3 85694 9278.9 301.6051 23.00972
23 88.20 42.114 set2.csv.2 19.35861 5 85653 4375.5 301.6047 19.00972
28 90.20 52.114 set3.csv.1 00.35889 2 35694 5245.2 301.6053 00.00944
I went through many similar questions on stackoverflow and I really like data.table
library codes as they look really elegant. However, I made several failed attempts where I received either a table constructed based on two sets (total no. of rows 45467) or merged only one column time_a
to the set_B
... Nevertheless, I won't be picky and if somebody has another idea I would be very grateful for help.
Example of a code on which I'm working on:
setDT(set_B)
setDT(set_A)
setkey(set_B, time_b) [, time_a:=time_b]
test_ab <- set_B[set_A, roll='nearest']
As a result I am receiving not only a table with data which should be neglaced but also "a mess" in column names (e.g. a column which contains ID_a
values is called time_a
).
I would really appricieate your help!