Probably I will need to change the title, but I lack the language to express concisely the problem.
I have a big data frame with the following structure:
class(df)
[1] "SpatialPointsDataFrame"
df
coordinates ID group_id value1 time zone
1 (9.09077, 45.47737) 24739 1 70 1505480403 62
2 (9.24016, 45.50221) 62965 1 35 1505480403 19
3 (9.22375, 45.50891) 93104 1 86 1505480403 16
4 (9.24392, 45.51171) 35646 1 80 1505480403 17
5 (9.24326, 45.51349) 24747 1 80 1505480403 17
6 (9.12738, 45.46933) 44274 1 53 1505480403 60
7 (9.25434, 45.45346) 53579 1 99 1505480403 30
8 (9.20306, 45.42118) 74565 1 37 1505480403 38
9 (9.1183, 45.45458) 74521 1 79 1505480403 53
10 (9.14386, 45.44674) 53584 1 62 1505480403 49
11 (9.14961, 45.51675) 44283 1 80 1505480403 81
12 (9.20478, 45.50888) 62936 1 50 1505480403 13
13 (9.20352, 45.421) 87311 1 80 1505480403 38
14 (9.11081, 45.49871) 43467 1 80 1505480403 65
15 (9.25292, 45.51276) 93101 1 41 1505480403 17
16 (9.08882, 45.45794) 53507 1 26 1505480403 55
17 (9.2071, 45.50587) 53522 1 80 1505480403 13
18 (9.22379, 45.50888) 87372 1 80 1505480403 16
19 (9.24388, 45.50456) 87332 1 21 1505480403 19
20 (9.20917, 45.51816) 47710 1 69 1505480403 15
21 (9.12292, 45.50486) 06213 1 83 1505480403 72
22 (9.21751, 45.51149) 35633 1 80 1505480403 16
23 (8.85098, 45.60636) 35632 1 70 1505480403 N/A
24 (9.08967, 45.46126) 79065 1 29 1505480403 55
25 (9.11043, 45.4958) 72813 1 80 1505480403 65
26 (9.14425, 45.44787) 06221 1 53 1505480403 49
27 (9.21566, 45.51105) 06286 1 81 1505480403 16
28 (9.15909, 45.52463) 53547 1 22 1505480403 82
29 (9.1357, 45.45472) 74513 1 85 1505480403 52
30 (8.85068, 45.60646) 62949 1 80 1505480403 N/A
31 (9.20874, 45.42597) 06296 1 20 1505480403 38
32 (9.1992, 45.51189) 62972 1 84 1505480403 14
33 (9.15462, 45.51798) 43496 1 52 1505480403 81
34 (9.08816, 45.47548) 87351 1 21 1505480403 62
35 (9.13154, 45.43608) 93106 1 50 1505480403 48
36 (9.14139, 45.44461) 14879 1 29 1505480403 48
37 (9.19553, 45.44263) 43464 1 78 1505480403 37
38 (9.14361, 45.48266) 87371 1 80 1505480403 58
39 (9.13611, 45.46989) 36099 1 91 1505480403 57
40 (9.22385, 45.50939) 53590 1 94 1505480403 16
41 (9.08562, 45.45992) 75494 1 40 1505480403 55
42 (9.20702, 45.52272) 24709 1 44 1505480403 14
43 (9.23615, 45.50845) 53577 1 31 1505480403 16
44 (9.08801, 45.47528) 24766 1 64 1505480403 62
45 (9.24426, 45.51104) 62975 1 80 1505480403 17
46 (9.2403, 45.50226) 62965 1 35 1505476803 19
47 (9.22376, 45.50896) 93104 1 86 1505476803 16
48 (9.09076, 45.47738) 24739 1 70 1505476803 62
49 (9.24405, 45.51173) 35646 1 80 1505476803 17
50 (9.24327, 45.51344) 24747 1 80 1505476803 17
106 (9.22914, 45.45137) 30907 1 89 1505376003 29
112 (9.22917, 45.45136) 30907 1 89 1505372403 29
118 (9.2291, 45.45126) 30907 1 89 1505368803 29
124 (9.22917, 45.45137) 30907 1 89 1505365203 29
130 (9.22918, 45.4514) 30907 1 89 1505361603 29
As you can see in the last lines it happens frequently that an observation happens in the same point(or its vicinities), with the same sensor in consecutive probes. Is there a way to transform the dataset to have a start_time (taking the time of the first observation in the point) and a end_time (taking the time of the last observation in the point +60min)?
The dataset has over 675000 observations, but lots of them are "groupable"
These are gps sensors on moving veichles, I gather info every hour from them, it happens that the veichles keep the same position through multiple observations. What I want to do is, for every groups of observations (same id, same position in a few meters range, contiguous times) generate a new dataframe where, if the observation is "single" it has start_time = time, end_time = time +60min
if, on the other side, there are several observations that can be grouped (as in the lines 106-130 in the example above) it should have start_time = time of the first observation (1505376003 taking the aformentioned example), end_time = time of the last observation +60min (1505361603 +60min)
EDIT2: I was able to identify (using a subset with only one "ID") if time intervals are consecutive (still can't grasp why the difference is in seconds and not in hours) c(NA,diff(round(as_datetime(df$time), units="hours"))==-3600)
I tried looking through various sources to no avail.
Thank you