In TableA I have observations (num) and DateTimes (POSIXct) for the observations. In another table (B) I have groupId's (char), StartDateTimes (POSIXct) and EndDateTimes (POSIXct).
I want to join the two tables so that each observation in TableA is assigned a groupId from TableB based on which intervals from TableB have which TableA observation falling within them.
I asked a similar question for SQL and got a working answer here.
But now I am trying to do this with R and RStudio.
I've tried using the sqldf package to just reuse the SQL answer from the other question, and I also tried the fuzzyjoin package using the interval_inner_join function, but in both cases I only got 12 out of the 178 observations matched, and they were matched to intervals that the observation datetimes didn't actually fall in (the intervals are from the following day).
My attempt using sqldf-
Answer <- sqldf::sqldf('SELECT a.*, b.id
FROM a, b
WHERE a.dt between b.start and b.end')
The Result -
WEIGHT, dt , id
1. 1098 , 2019-03-27 06:58:16 , 1928
2. 1074 , 2019-03-27 06:59:13 , 1928
3. 1037 , 2019-03-27 07:02:52 , 1928
4. 1032 , 2019-03-27 07:03:53 , 1928
5. 1073 , 2019-03-27 07:08:21 , 1928
6. 1054 , 2019-03-27 07:09:25 , 1928
7. 996 , 2019-03-27 07:11:54 , 1928
8. 1081 , 2019-03-27 07:12:49 , 1928
9. 1010, 2019-03-27 07:30:40 , 1928
10. 980 , 2019-03-27 07:38:21 , 1928
11. 1044 , 2019-03-27 07:54:18 , 1928
12. 1009 , 2019-03-27 07:58:06 , 1928
And my attempt using fuzzyjoin-
a <- a %>%
mutate(start = dt,
end = dt) %>%
select(-dt)
Answer <- interval_inner_join(a, b, by = c('start','end'))
The Result - (not enough room, but end.y is all 2019-03-26 19:00:00)
WEIGHT, start.x , end.x , start.y, (end.y), id
1. 1098 ,2019-03-27 06:58:16 , 2019-03-27 06:58:16 , 2019-03-26 15:24:00,1928
2. 1074 ,2019-03-27 06:59:13 , 2019-03-27 06:59:13 , 2019-03-26 15:24:00,1928
3. 1037 ,2019-03-27 07:02:52 , 2019-03-27 07:02:52 , 2019-03-26 15:24:00,1928
4. 1032 ,2019-03-27 07:03:53 , 2019-03-27 07:03:53 , 2019-03-26 15:24:00,1928
5. 1073 ,2019-03-27 07:08:21 , 2019-03-27 07:08:21 , 2019-03-26 15:24:00,1928
6. 1054 ,2019-03-27 07:09:25 , 2019-03-27 07:09:25 , 2019-03-26 15:24:00,1928
7. 996 ,2019-03-27 07:11:54 , 2019-03-27 07:11:54 , 2019-03-26 15:24:00,1928
8. 1081 ,2019-03-27 07:12:49 , 2019-03-27 07:12:49 , 2019-03-26 15:24:00,1928
9. 1010,2019-03-27 07:30:40 , 2019-03-27 07:30:40 , 2019-03-26 15:24:00,1928
10. 980 ,2019-03-27 07:38:21 , 2019-03-27 07:38:21 , 2019-03-26 15:24:00,1928
11. 1044 ,2019-03-27 07:54:18 , 2019-03-27 07:54:18 , 2019-03-26 15:24:00,1928
12. 1009 ,2019-03-27 07:58:06 , 2019-03-27 07:58:06 ,2019-03-26 15:24:00,1928
As you can see above, the results are that I get only 12 matches and they are incorrect. It is matching 12 observations to a nearby but incorrect interval out of the 178 observations.
Ideally I would get at least the majority of the 178 observations matched, obviously correctly. Now, one complication may be that there are some overlapping intervals, and it is possible that some of the observations then fall within multiple intervals. Is this the issue? Can it be handled so that those cases raise an NA id or something?
Any help would be very appreciated.
Here is the raw data in tsv format:
TableA
WEIGHT dt
894 18/12/2018 11:51
946 18/12/2018 11:52
907 14/02/2019 11:53
921 25/02/2019 10:52
866 25/02/2019 10:53
1021 25/02/2019 18:07
1022 25/02/2019 18:09
1032 25/02/2019 18:12
979 25/02/2019 18:16
1097 25/02/2019 18:19
1094 25/02/2019 18:23
880 26/02/2019 10:58
866 26/02/2019 11:00
876 26/02/2019 11:01
876 26/02/2019 11:01
861 26/02/2019 11:04
869 26/02/2019 11:06
890 26/02/2019 11:07
759 26/02/2019 11:10
813 26/02/2019 11:11
845 26/02/2019 11:30
989 26/02/2019 12:15
894 26/02/2019 15:00
963 26/02/2019 15:01
928 26/02/2019 15:03
822 26/02/2019 15:11
906 26/02/2019 15:14
994 26/02/2019 15:32
1011 26/02/2019 15:34
1004 26/02/2019 16:59
996 26/02/2019 17:01
883 26/02/2019 17:03
942 26/02/2019 17:05
950 26/02/2019 17:07
1018 26/02/2019 17:27
1003 28/02/2019 9:34
1021 28/02/2019 9:38
984 28/02/2019 9:42
942 28/02/2019 9:45
1067 28/02/2019 9:50
1034 28/02/2019 9:53
1087 28/02/2019 9:56
1040 28/02/2019 10:00
1090 28/02/2019 10:10
1029 28/02/2019 10:31
1023 28/02/2019 10:36
1010 28/02/2019 10:38
1051 28/02/2019 10:41
988 28/02/2019 10:42
1100 28/02/2019 10:43
985 1/03/2019 11:21
924 1/03/2019 11:22
988 1/03/2019 11:27
1001 1/03/2019 11:30
994 1/03/2019 11:35
958 1/03/2019 11:46
960 1/03/2019 11:48
990 1/03/2019 11:51
995 1/03/2019 11:52
928 1/03/2019 13:14
925 1/03/2019 13:16
1052 1/03/2019 13:18
1034 1/03/2019 13:27
1031 1/03/2019 13:28
944 1/03/2019 13:56
1038 1/03/2019 13:58
1056 1/03/2019 14:32
1050 1/03/2019 16:45
958 1/03/2019 17:33
1007 14/03/2019 14:19
985 14/03/2019 14:31
978 14/03/2019 14:35
1031 14/03/2019 14:56
976 14/03/2019 15:01
1063 14/03/2019 15:03
965 14/03/2019 15:06
1050 14/03/2019 15:07
894 14/03/2019 15:11
830 25/03/2019 11:14
920 25/03/2019 11:16
966 25/03/2019 11:18
992 25/03/2019 11:20
881 25/03/2019 11:21
823 25/03/2019 11:22
1046 25/03/2019 11:23
913 25/03/2019 11:24
885 25/03/2019 11:26
944 25/03/2019 11:28
938 25/03/2019 11:31
937 25/03/2019 11:36
937 25/03/2019 11:36
946 25/03/2019 11:40
985 25/03/2019 11:41
942 25/03/2019 11:45
1098 27/03/2019 6:58
1074 27/03/2019 6:59
1037 27/03/2019 7:02
1032 27/03/2019 7:03
1073 27/03/2019 7:08
1054 27/03/2019 7:09
996 27/03/2019 7:11
1081 27/03/2019 7:12
1010 27/03/2019 7:30
980 27/03/2019 7:38
1044 27/03/2019 7:54
1009 27/03/2019 7:58
963 27/03/2019 8:07
1047 27/03/2019 8:10
1114 27/03/2019 8:20
1110 27/03/2019 8:24
1009 27/03/2019 8:32
998 27/03/2019 11:49
1076 27/03/2019 17:48
1042 27/03/2019 17:51
1011 27/03/2019 17:53
1068 27/03/2019 17:54
1061 27/03/2019 17:58
1018 27/03/2019 17:59
1063 27/03/2019 18:00
1093 27/03/2019 18:00
1108 27/03/2019 18:01
1083 27/03/2019 18:03
1109 27/03/2019 18:05
1057 27/03/2019 18:06
1066 27/03/2019 18:07
1043 27/03/2019 18:08
1054 27/03/2019 18:09
1018 27/03/2019 18:11
1048 27/03/2019 18:12
1045 27/03/2019 18:13
1065 27/03/2019 18:18
1066 27/03/2019 18:19
1052 5/04/2019 13:50
1056 6/04/2019 16:43
1042 6/04/2019 16:43
1039 6/04/2019 16:49
954 6/04/2019 16:49
1015 6/04/2019 16:52
1009 6/04/2019 16:52
1026 6/04/2019 16:55
991 6/04/2019 16:56
1050 6/04/2019 17:13
1050 6/04/2019 17:13
934 6/04/2019 17:14
892 6/04/2019 17:23
1029 6/04/2019 17:26
1020 6/04/2019 17:29
1060 6/04/2019 17:32
1007 6/04/2019 17:35
989 6/04/2019 17:38
1006 6/04/2019 17:41
993 6/04/2019 17:45
1048 6/04/2019 17:52
1025 6/04/2019 17:57
1046 6/04/2019 18:21
988 6/04/2019 18:22
1024 8/04/2019 7:17
1076 8/04/2019 7:18
963 8/04/2019 7:19
1122 8/04/2019 7:20
1019 8/04/2019 7:21
1014 8/04/2019 7:22
907 8/04/2019 7:22
1033 8/04/2019 7:23
989 8/04/2019 7:25
1010 8/04/2019 7:26
1015 8/04/2019 7:27
979 8/04/2019 7:28
1018 8/04/2019 7:28
1054 8/04/2019 7:29
937 8/04/2019 7:31
918 8/04/2019 7:32
1054 8/04/2019 7:33
977 8/04/2019 7:34
1028 8/04/2019 7:35
942 8/04/2019 7:36
1009 8/04/2019 7:38
927 8/04/2019 7:39
TableB
start end id
13/11/2018 10:54 13/11/2018 16:30 1900
14/11/2018 8:00 14/11/2018 16:30 1900
22/11/2018 8:00 22/11/2018 16:30 1900
29/11/2018 13:04 29/11/2018 16:30 1903
30/11/2018 8:00 30/11/2018 16:30 1903
3/12/2018 8:00 3/12/2018 16:30 1903
4/12/2018 8:00 4/12/2018 16:30 1903
5/12/2018 8:00 5/12/2018 16:30 1903
6/12/2018 8:00 6/12/2018 9:47 1900
6/12/2018 9:47 6/12/2018 14:30 1903
10/12/2018 8:00 10/12/2018 16:30 1905
12/12/2018 8:00 12/12/2018 16:30 1905
13/12/2018 8:00 13/12/2018 16:33 1901
14/12/2018 8:00 14/12/2018 11:30 1905
14/12/2018 11:19 14/12/2018 16:30 1904
17/12/2018 8:00 17/12/2018 13:45 1904
17/12/2018 13:45 17/12/2018 14:45 1904
17/12/2018 14:41 17/12/2018 16:30 1902
18/12/2018 8:00 18/12/2018 16:30 1902
19/12/2018 8:00 19/12/2018 14:00 1902
19/12/2018 14:07 19/12/2018 16:30 1902
21/12/2018 8:00 21/12/2018 16:30 1905
7/01/2019 8:00 7/01/2019 16:30 1905
8/01/2019 8:00 8/01/2019 16:30 1905
9/01/2019 8:00 9/01/2019 16:30 1905
10/01/2019 8:00 10/01/2019 15:30 1905
10/01/2019 15:23 10/01/2019 16:30 1906
11/01/2019 8:00 11/01/2019 9:50 1902
11/01/2019 9:37 11/01/2019 16:31 1902
14/01/2019 8:00 14/01/2019 11:45 1902
14/01/2019 11:44 14/01/2019 13:00 1909
14/01/2019 12:10 14/01/2019 13:10 1910
14/01/2019 13:00 14/01/2019 16:30 1910
15/01/2019 8:00 15/01/2019 16:30 1910
16/01/2019 8:00 16/01/2019 16:30 1907
17/01/2019 8:00 17/01/2019 16:30 1907
18/01/2019 8:00 18/01/2019 8:17 1905
18/01/2019 8:00 18/01/2019 16:30 1907
21/01/2019 8:00 21/01/2019 16:30 1912
22/01/2019 8:00 22/01/2019 16:30 1912
23/01/2019 8:00 23/01/2019 16:30 1912
24/01/2019 8:00 24/01/2019 13:30 1912
24/01/2019 13:31 24/01/2019 16:30 1921
25/01/2019 8:00 25/01/2019 10:15 1921
25/01/2019 10:15 25/01/2019 16:30 1907
29/01/2019 8:00 29/01/2019 10:15 1902
29/01/2019 13:00 29/01/2019 16:30 1950
30/01/2019 8:00 30/01/2019 16:30 1950
31/01/2019 8:00 31/01/2019 16:30 1920
1/02/2019 8:00 1/02/2019 16:30 1920
2/02/2019 8:00 2/02/2019 16:30 1915
4/02/2019 7:00 4/02/2019 19:00 1950
5/02/2019 7:00 5/02/2019 19:04 1950
7/02/2019 7:00 7/02/2019 11:15 1920
7/02/2019 11:09 7/02/2019 15:25 1924
7/02/2019 15:15 7/02/2019 19:00 1924
8/02/2019 7:00 8/02/2019 8:30 1950
8/02/2019 7:59 8/02/2019 19:00 1950
9/02/2019 7:00 9/02/2019 19:02 1950
9/02/2019 10:16 9/02/2019 11:45 1950
9/02/2019 11:45 9/02/2019 19:00 1912
11/02/2019 7:00 11/02/2019 14:01 1950
11/02/2019 7:00 11/02/2019 10:20 1912
11/02/2019 10:17 11/02/2019 19:00 1950
11/02/2019 11:23 11/02/2019 11:40 1915
11/02/2019 13:45 11/02/2019 19:00 1950
12/02/2019 7:00 12/02/2019 12:00 1950
12/02/2019 7:01 12/02/2019 16:30 1950
12/02/2019 11:38 12/02/2019 19:05 1958
12/02/2019 16:30 12/02/2019 19:06 1905
13/02/2019 7:00 13/02/2019 19:01 1924
13/02/2019 7:00 13/02/2019 9:35 1905
14/02/2019 7:00 14/02/2019 15:00 1924
14/02/2019 14:54 14/02/2019 19:00 1972
15/02/2019 7:00 15/02/2019 15:00 1970
15/02/2019 15:00 15/02/2019 19:00 1958
16/02/2019 7:00 16/02/2019 17:30 1958
16/02/2019 17:22 16/02/2019 19:00 1958
18/02/2019 7:00 18/02/2019 10:30 1958
18/02/2019 10:12 18/02/2019 15:30 1955
18/02/2019 14:49 18/02/2019 19:00 1913
19/02/2019 7:00 19/02/2019 19:00 1924
19/02/2019 8:54 19/02/2019 9:35 1958
19/02/2019 9:37 19/02/2019 9:40 1913
20/02/2019 7:00 20/02/2019 19:00 1924
21/02/2019 7:00 21/02/2019 19:07 1913
21/02/2019 9:53 21/02/2019 10:07 1915
21/02/2019 10:06 21/02/2019 17:40 1913
21/02/2019 17:42 21/02/2019 19:00 1915
22/02/2019 7:00 22/02/2019 19:05 1935
22/02/2019 7:00 22/02/2019 8:15 1915
22/02/2019 8:21 22/02/2019 13:40 1913
22/02/2019 14:00 22/02/2019 19:00 1920
23/02/2019 7:00 23/02/2019 10:00 1953
23/02/2019 7:12 23/02/2019 8:20 1920
23/02/2019 8:21 23/02/2019 10:00 1972
23/02/2019 10:02 23/02/2019 19:04 1943
23/02/2019 10:20 23/02/2019 11:00 1958
25/02/2019 7:00 25/02/2019 13:00 1930
25/02/2019 13:02 25/02/2019 19:00 1972
26/02/2019 7:00 26/02/2019 19:00 1930
27/02/2019 7:00 27/02/2019 19:00 1930
28/02/2019 7:00 28/02/2019 14:45 1943
28/02/2019 8:27 28/02/2019 19:03 1915
28/02/2019 14:26 28/02/2019 19:01 1913
1/03/2019 7:00 1/03/2019 13:00 1913
1/03/2019 7:00 1/03/2019 9:00 3316
1/03/2019 9:32 1/03/2019 13:40 1913
1/03/2019 11:49 1/03/2019 11:49 3316
1/03/2019 13:02 1/03/2019 19:00 1960
1/03/2019 13:48 1/03/2019 19:00 1924
2/03/2019 7:00 2/03/2019 19:09 1960
4/03/2019 7:00 4/03/2019 19:00 1972
5/03/2019 7:00 5/03/2019 14:30 1935
5/03/2019 8:45 5/03/2019 19:00 1924
5/03/2019 14:17 5/03/2019 19:00 1924
6/03/2019 7:00 6/03/2019 18:00 1924
6/03/2019 7:52 6/03/2019 18:00 1950
6/03/2019 17:31 6/03/2019 19:00 1973
6/03/2019 18:00 6/03/2019 19:00 1930
7/03/2019 6:27 7/03/2019 17:10 1973
7/03/2019 17:10 7/03/2019 19:00 1972
8/03/2019 7:00 8/03/2019 19:00 1960
8/03/2019 7:00 8/03/2019 19:00 1915
9/03/2019 7:00 9/03/2019 7:00 1960
9/03/2019 7:00 9/03/2019 15:45 1915
11/03/2019 7:00 11/03/2019 19:00 1978
12/03/2019 7:00 12/03/2019 13:53 1978
12/03/2019 13:44 12/03/2019 19:00 1978
13/03/2019 7:00 13/03/2019 19:00 1928
14/03/2019 7:00 14/03/2019 17:10 1960
14/03/2019 17:10 14/03/2019 19:00 1915
15/03/2019 7:00 15/03/2019 10:20 1960
15/03/2019 10:02 15/03/2019 19:00 1940
16/03/2019 7:00 16/03/2019 19:00 1940
18/03/2019 7:00 18/03/2019 9:00 1978
18/03/2019 8:36 18/03/2019 11:57 1978
18/03/2019 11:44 18/03/2019 19:00 1978
19/03/2019 7:00 19/03/2019 9:40 1978
19/03/2019 9:34 19/03/2019 19:00 1928
20/03/2019 7:00 20/03/2019 19:00 1928
20/03/2019 9:29 20/03/2019 9:50 1960
20/03/2019 9:50 20/03/2019 12:00 1940
20/03/2019 11:53 20/03/2019 14:30 1958
20/03/2019 14:31 20/03/2019 19:00 1960
21/03/2019 7:00 21/03/2019 19:00 1928
22/03/2019 7:00 22/03/2019 19:00 1962
22/03/2019 8:37 22/03/2019 19:00 1960
22/03/2019 14:33 22/03/2019 14:40 1935
22/03/2019 14:41 22/03/2019 16:50 1930
23/03/2019 7:00 23/03/2019 19:00 1962
23/03/2019 7:00 23/03/2019 9:30 1960
23/03/2019 9:37 23/03/2019 13:00 1960
23/03/2019 13:00 23/03/2019 13:59 1958
23/03/2019 14:01 23/03/2019 19:00 1943
25/03/2019 7:00 25/03/2019 19:00 1928
26/03/2019 7:00 26/03/2019 15:30 1928
26/03/2019 15:24 26/03/2019 19:00 1928
27/03/2019 7:00 27/03/2019 19:00 1926
27/03/2019 9:24 27/03/2019 10:20 1960
27/03/2019 10:25 27/03/2019 19:00 1950
28/03/2019 7:00 28/03/2019 19:00 1980
28/03/2019 11:59 28/03/2019 19:00 1950
29/03/2019 7:00 29/03/2019 17:30 1962
29/03/2019 7:04 29/03/2019 9:30 1980
29/03/2019 7:00 29/03/2019 16:25 1910
29/03/2019 16:27 29/03/2019 19:00 1910
29/03/2019 16:53 29/03/2019 19:00 1980
30/03/2019 7:00 30/03/2019 15:30 1980
30/03/2019 15:21 30/03/2019 19:00 1915
1/04/2019 7:00 1/04/2019 19:00 1926
2/04/2019 7:00 2/04/2019 11:15 1926
2/04/2019 10:55 2/04/2019 19:00 1978
3/04/2019 7:00 3/04/2019 19:00 1978
4/04/2019 7:00 4/04/2019 10:15 1954
4/04/2019 9:56 4/04/2019 13:46 1987
4/04/2019 13:46 4/04/2019 19:00 1987
5/04/2019 7:00 5/04/2019 19:00 1987
5/04/2019 10:26 5/04/2019 19:00 1926
6/04/2019 7:00 6/04/2019 19:00 1987
6/04/2019 7:00 6/04/2019 9:15 1935
6/04/2019 9:22 6/04/2019 9:30 1973
6/04/2019 9:35 6/04/2019 14:30 1972
6/04/2019 14:28 6/04/2019 19:00 1924
8/04/2019 7:00 8/04/2019 17:15 1972
8/04/2019 16:53 8/04/2019 18:30 1907
9/04/2019 7:00 9/04/2019 16:30 1932
9/04/2019 16:19 9/04/2019 19:00 1932
10/04/2019 7:00 10/04/2019 10:00 1932
10/04/2019 9:41 10/04/2019 18:15 1978
10/04/2019 17:41 10/04/2019 19:00 1970