I'm doing some trajectory analysis using R and PostgreSQL. In order to form groups of trajectory segments where successive positions are spatio-temporally near, I've created the following table. What I'm still missing is the column group_id
, which is what my question is about.
bike_id1 datetime bike_id2 near group_id
1 2016-05-28 11:00:00 2 TRUE 1
1 2016-05-28 11:00:05 2 TRUE 1
1 2016-05-28 11:00:10 2 FALSE NA
[...]
2 2016-05-28 11:00:05 3 TRUE 1
2 2016-05-28 11:00:10 3 TRUE 1
This is the result of multiple comparisons between each trajectory with every other (all combinations without repetitions) and an inner join on datetime
(sampled always on a multiple of 5 seconds). It shows that for certain positions, bike 1 and 2 were sampled at the same time and are spatially near (some arbitrary threshold).
Now I'd like to give away unique ids for the segments where two bikes are spatio-temporally near (group_id
). This is where I'm stuck: I'd want the group_id
to respect groups with multiple trajectories. The method for assigning the group_id
should realize that if bike 1 and 2 are in a group at 2016-05-28 11:00:05
, then 3 belongs to the same group if it is near to 2 at that same timestamp (2016-05-28 11:00:05
).
Are there tools within R or PostgreSQL that would help me with this task? Running a loop through the table seems like the wrong way to go about this.
EDIT: As @wildplasser pointed out, this seems to be a gaps-and-islands problem which is traditionally solved using SQL. He has kindly produced some sample data that I have slightly extended and will include in the question.
CREATE TABLE nearness
-- ( seq SERIAL NOT NULL UNIQUE -- surrogate for conveniance
( bike1 INTEGER NOT NULL
, bike2 INTEGER NOT NULL
, stamp timestamp NOT NULL
, near boolean
, PRIMARY KEY(bike1,bike2,stamp)
);
INSERT INTO nearness( bike1,bike2,stamp,near) VALUES
(1,2, '2016-05-28 11:00:00', TRUE)
,(1,2, '2016-05-28 11:00:05', TRUE)
,(1,2, '2016-05-28 11:00:10', TRUE)
,(1,2, '2016-05-28 11:00:20', TRUE) -- <<-- gap here
,(1,2, '2016-05-28 11:00:25', TRUE)
,(1,2, '2016-05-28 11:00:30', FALSE)
,(4,5, '2016-05-28 11:00:00', FALSE)
,(4,5, '2016-05-28 11:00:05', FALSE)
,(4,5, '2016-05-28 11:00:10', TRUE)
,(4,5, '2016-05-28 11:00:15', TRUE)
,(4,5, '2016-05-28 11:00:20', TRUE)
,(2,3, '2016-05-28 11:00:05', TRUE) -- <<-- bike 1, 2, 3 are in one grp @ 11:00:05
,(2,3, '2016-05-28 11:00:10', TRUE) -- <<-- no group here
,(6,7, '2016-05-28 11:00:00', FALSE)
,(6,7, '2016-05-28 11:00:05', FALSE)
;