1

I have merged two data frames by time:

Mergeboth <- merge(test, deeptest, by.x="date.time", by.y="Bottom.Start", all=TRUE)

The test dataframe is very large (over 1 million datapoints) while the deeptest is only around 100. Checking with complete.cases() it correctly merged the two dataframes. Now that the dataframes are merged, I would like to select a certain time interval before and after the complete.cases (let's say 2 minutes for ease of output- but my actual goal is 12 hours), assign a new number to all of those newly selected rows and output it into a new dataframe: df_new.

So my current dataframe looks like this:

MergeBoth()
 Bottom.Start           Dive Max.Depth Depth
 1: 2015-12-22 01:07:00   NA        NA 311.0
 2: 2015-12-22 01:07:10   NA        NA 308.5
 3: 2015-12-22 01:07:20   NA        NA 307.0
 4: 2015-12-22 01:07:30   NA        NA 306.5
 5: 2015-12-22 01:07:40   NA        NA 305.5
 6: 2015-12-22 01:07:50   NA        NA 308.5
 7: 2015-12-22 01:08:00   NA        NA 318.5
 8: 2015-12-22 01:08:10   NA        NA 331.0
 9: 2015-12-22 01:08:20   NA        NA 345.5
10: 2015-12-22 01:08:30   NA        NA 361.0
11: 2015-12-22 01:08:40   NA        NA 376.5
12: 2015-12-22 01:08:50   NA        NA 392.0
13: 2015-12-22 01:09:00   NA        NA 408.5
14: 2015-12-22 01:09:10   NA        NA 425.5
15: 2015-12-22 01:09:20   NA        NA 442.5
16: 2015-12-22 01:09:30   NA        NA 459.0
17: 2015-12-22 01:09:40   NA        NA 475.0
18: 2015-12-22 01:09:50   NA        NA 491.0
19: 2015-12-22 01:10:00 1238       727 508.0
20: 2015-12-22 01:10:10   NA        NA 523.5
21: 2015-12-22 01:10:20   NA        NA 540.5
22: 2015-12-22 01:10:30   NA        NA 556.5
23: 2015-12-22 01:10:40   NA        NA 572.5
24: 2015-12-22 01:10:50   NA        NA 581.0
25: 2015-12-22 01:11:00   NA        NA 583.5
26: 2015-12-22 01:11:10   NA        NA 582.0
27: 2015-12-22 01:11:20   NA        NA 581.0
28: 2015-12-22 01:11:30   NA        NA 581.0
29: 2015-12-22 01:11:40   NA        NA 581.0
30: 2015-12-22 01:11:50   NA        NA 581.0
31: 2015-12-22 01:12:00   NA        NA 582.0
32: 2015-12-22 01:12:10   NA        NA 581.5
33: 2015-12-22 01:12:20   NA        NA 581.5
34: 2015-12-22 01:12:30   NA        NA 592.5
35: 2015-12-22 01:12:40   NA        NA 606.5
36: 2015-12-22 01:12:50   NA        NA 621.5
37: 2015-12-22 01:13:00   NA        NA 637.5
38: 2015-12-22 01:13:10   NA        NA 655.0
39: 2015-12-23 07:17:00   NA        NA 863.0
40: 2015-12-23 07:17:10   NA        NA 863.5
41: 2015-12-23 07:17:20   NA        NA 865.0
42: 2015-12-23 07:17:30   NA        NA 866.0
43: 2015-12-23 07:17:40   NA        NA 867.0
44: 2015-12-23 07:17:50   NA        NA 867.5
45: 2015-12-23 07:18:00   NA        NA 868.5
46: 2015-12-23 07:18:10   NA        NA 870.0
47: 2015-12-23 07:18:20   NA        NA 870.5
48: 2015-12-23 07:18:30   NA        NA 871.0
49: 2015-12-23 07:18:40   NA        NA 872.0
50: 2015-12-23 07:18:50   NA        NA 872.0
51: 2015-12-23 07:19:00 1267       970 874.0
52: 2015-12-23 07:19:10   NA        NA 875.0
53: 2015-12-23 07:19:20   NA        NA 875.0
54: 2015-12-23 07:19:30   NA        NA 876.0
55: 2015-12-23 07:19:40   NA        NA 876.5
56: 2015-12-23 07:19:50   NA        NA 876.0
57: 2015-12-23 07:20:00   NA        NA 878.0
58: 2015-12-23 07:20:10   NA        NA 876.0
59: 2015-12-23 07:20:20   NA        NA 875.5
60: 2015-12-23 07:20:30   NA        NA 875.5
61: 2015-12-23 07:20:40   NA        NA 874.0
62: 2015-12-23 07:20:50   NA        NA 872.5
63: 2015-12-23 07:21:00   NA        NA 870.5
64: 2015-12-23 07:21:10   NA        NA 867.0
65: 2015-12-23 07:21:20   NA        NA 863.5
66: 2015-12-23 07:21:30   NA        NA 860.5
67: 2015-12-23 07:21:40   NA        NA 859.0
68: 2015-12-23 07:21:50   NA        NA 861.0
69: 2015-12-23 07:22:00   NA        NA 864.5
70: 2015-12-23 07:22:10   NA        NA 868.5
71: 2015-12-23 07:22:20   NA        NA 874.5
72: 2015-12-23 07:22:30   NA        NA 882.0
73: 2015-12-23 07:22:40   NA        NA 894.0
74: 2015-12-23 07:22:50   NA        NA 907.0
75: 2015-12-23 07:23:00   NA        NA 922.5

And I am trying to get this:

df_new()
            Bottom.Start Dive Max.Depth Depth DiveNumber
 1: 2015-12-22 01:08:00   NA        NA 318.5          1
 2: 2015-12-22 01:08:10   NA        NA 331.0          1
 3: 2015-12-22 01:08:20   NA        NA 345.5          1
 4: 2015-12-22 01:08:30   NA        NA 361.0          1
 5: 2015-12-22 01:08:40   NA        NA 376.5          1
 6: 2015-12-22 01:08:50   NA        NA 392.0          1
 7: 2015-12-22 01:09:00   NA        NA 408.5          1
 8: 2015-12-22 01:09:10   NA        NA 425.5          1
 9: 2015-12-22 01:09:20   NA        NA 442.5          1
10: 2015-12-22 01:09:30   NA        NA 459.0          1
11: 2015-12-22 01:09:40   NA        NA 475.0          1
12: 2015-12-22 01:09:50   NA        NA 491.0          1
13: 2015-12-22 01:10:00 1238       727 508.0          1
14: 2015-12-22 01:10:10   NA        NA 523.5          1
15: 2015-12-22 01:10:20   NA        NA 540.5          1
16: 2015-12-22 01:10:30   NA        NA 556.5          1
17: 2015-12-22 01:10:40   NA        NA 572.5          1
18: 2015-12-22 01:10:50   NA        NA 581.0          1
19: 2015-12-22 01:11:00   NA        NA 583.5          1
20: 2015-12-22 01:11:10   NA        NA 582.0          1
21: 2015-12-22 01:11:20   NA        NA 581.0          1
22: 2015-12-22 01:11:30   NA        NA 581.0          1
23: 2015-12-22 01:11:40   NA        NA 581.0          1
24: 2015-12-22 01:11:50   NA        NA 581.0          1
25: 2015-12-22 01:12:00   NA        NA 582.0          1
26: 2015-12-23 07:17:00   NA        NA 863.0          2
27: 2015-12-23 07:17:10   NA        NA 863.5          2
28: 2015-12-23 07:17:20   NA        NA 865.0          2
29: 2015-12-23 07:17:30   NA        NA 866.0          2
30: 2015-12-23 07:17:40   NA        NA 867.0          2
31: 2015-12-23 07:17:50   NA        NA 867.5          2
32: 2015-12-23 07:18:00   NA        NA 868.5          2
33: 2015-12-23 07:18:10   NA        NA 870.0          2
34: 2015-12-23 07:18:20   NA        NA 870.5          2
35: 2015-12-23 07:18:30   NA        NA 871.0          2
36: 2015-12-23 07:18:40   NA        NA 872.0          2
37: 2015-12-23 07:18:50   NA        NA 872.0          2
38: 2015-12-23 07:19:00 1267       970 874.0          2
39: 2015-12-23 07:19:10   NA        NA 875.0          2
40: 2015-12-23 07:19:20   NA        NA 875.0          2
41: 2015-12-23 07:19:30   NA        NA 876.0          2
42: 2015-12-23 07:19:40   NA        NA 876.5          2
43: 2015-12-23 07:19:50   NA        NA 876.0          2
44: 2015-12-23 07:20:00   NA        NA 878.0          2
45: 2015-12-23 07:20:10   NA        NA 876.0          2
46: 2015-12-23 07:20:20   NA        NA 875.5          2
47: 2015-12-23 07:20:30   NA        NA 875.5          2
48: 2015-12-23 07:20:40   NA        NA 874.0          2
49: 2015-12-23 07:20:50   NA        NA 872.5          2
50: 2015-12-23 07:21:00   NA        NA 870.5          2

Here is my dput() for the dataframe:

structure(list(Bottom.Start = structure(c(1450746420, 1450746430, 
1450746440, 1450746450, 1450746460, 1450746470, 1450746480, 1450746490, 
1450746500, 1450746510, 1450746520, 1450746530, 1450746540, 1450746550, 
1450746560, 1450746570, 1450746580, 1450746590, 1450746600, 1450746610, 
1450746620, 1450746630, 1450746640, 1450746650, 1450746660, 1450746670, 
1450746680, 1450746690, 1450746700, 1450746710, 1450746720, 1450746730, 
1450746740, 1450746750, 1450746760, 1450746770, 1450746780, 1450746790, 
1450855020, 1450855030, 1450855040, 1450855050, 1450855060, 1450855070, 
1450855080, 1450855090, 1450855100, 1450855110, 1450855120, 1450855130, 
1450855140, 1450855150, 1450855160, 1450855170, 1450855180, 1450855190, 
1450855200, 1450855210, 1450855220, 1450855230, 1450855240, 1450855250, 
1450855260, 1450855270, 1450855280, 1450855290, 1450855300, 1450855310, 
1450855320, 1450855330, 1450855340, 1450855350, 1450855360, 1450855370, 
1450855380), tzone = "UTC", class = c("POSIXct", "POSIXt")), 
    Dive = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, 1238L, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, 1267L, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), Max.Depth = c(NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 727, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    970, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Depth = c(311, 
    308.5, 307, 306.5, 305.5, 308.5, 318.5, 331, 345.5, 361, 
    376.5, 392, 408.5, 425.5, 442.5, 459, 475, 491, 508, 523.5, 
    540.5, 556.5, 572.5, 581, 583.5, 582, 581, 581, 581, 581, 
    582, 581.5, 581.5, 592.5, 606.5, 621.5, 637.5, 655, 863, 
    863.5, 865, 866, 867, 867.5, 868.5, 870, 870.5, 871, 872, 
    872, 874, 875, 875, 876, 876.5, 876, 878, 876, 875.5, 875.5, 
    874, 872.5, 870.5, 867, 863.5, 860.5, 859, 861, 864.5, 868.5, 
    874.5, 882, 894, 907, 922.5)), row.names = c(NA, -75L), class = c("data.table", 
"data.frame"))

I have tried lag() and tail() and a combination of dplyr arguments including case_when() and mutate() but cannot seem to get my desired output. I am relatively new to R and appreciate the help!

Meg.abytes
  • 169
  • 8

2 Answers2

1

I'm not sure what your original data.frames look like before the merge, but this is something to consider (using data.table which may be relatively faster).

First, you can create (or you may already have?) a small data.frame that includes complete cases. You can enumerate the dives and determine your time ranges of interest for these rows of data.

Then, you can perform a non-equi join and select rows of data that have Bottom.Start times within the determined time range.

library(data.table)
library(lubridate)

df_c <- df[complete.cases(df), ]
df_c[, `:=` (DiveNumber = seq_len(.N), Start = Bottom.Start - minutes(2), End = Bottom.Start + minutes(2))]

df[df_c, .(Bottom.Start = x.Bottom.Start, Dive, Max.Depth, Depth, DiveNumber), on = .(Bottom.Start >= Start, Bottom.Start <= End)]

Output

           Bottom.Start Dive Max.Depth Depth DiveNumber
 1: 2015-12-22 01:08:00   NA        NA 318.5          1
 2: 2015-12-22 01:08:10   NA        NA 331.0          1
 3: 2015-12-22 01:08:20   NA        NA 345.5          1
 4: 2015-12-22 01:08:30   NA        NA 361.0          1
 5: 2015-12-22 01:08:40   NA        NA 376.5          1
 6: 2015-12-22 01:08:50   NA        NA 392.0          1
 7: 2015-12-22 01:09:00   NA        NA 408.5          1
 8: 2015-12-22 01:09:10   NA        NA 425.5          1
 9: 2015-12-22 01:09:20   NA        NA 442.5          1
10: 2015-12-22 01:09:30   NA        NA 459.0          1
11: 2015-12-22 01:09:40   NA        NA 475.0          1
12: 2015-12-22 01:09:50   NA        NA 491.0          1
13: 2015-12-22 01:10:00 1238       727 508.0          1
14: 2015-12-22 01:10:10   NA        NA 523.5          1
15: 2015-12-22 01:10:20   NA        NA 540.5          1
16: 2015-12-22 01:10:30   NA        NA 556.5          1
17: 2015-12-22 01:10:40   NA        NA 572.5          1
18: 2015-12-22 01:10:50   NA        NA 581.0          1
19: 2015-12-22 01:11:00   NA        NA 583.5          1
20: 2015-12-22 01:11:10   NA        NA 582.0          1
21: 2015-12-22 01:11:20   NA        NA 581.0          1
22: 2015-12-22 01:11:30   NA        NA 581.0          1
23: 2015-12-22 01:11:40   NA        NA 581.0          1
24: 2015-12-22 01:11:50   NA        NA 581.0          1
25: 2015-12-22 01:12:00   NA        NA 582.0          1
26: 2015-12-23 07:17:00   NA        NA 863.0          2
27: 2015-12-23 07:17:10   NA        NA 863.5          2
28: 2015-12-23 07:17:20   NA        NA 865.0          2
29: 2015-12-23 07:17:30   NA        NA 866.0          2
30: 2015-12-23 07:17:40   NA        NA 867.0          2
31: 2015-12-23 07:17:50   NA        NA 867.5          2
32: 2015-12-23 07:18:00   NA        NA 868.5          2
33: 2015-12-23 07:18:10   NA        NA 870.0          2
34: 2015-12-23 07:18:20   NA        NA 870.5          2
35: 2015-12-23 07:18:30   NA        NA 871.0          2
36: 2015-12-23 07:18:40   NA        NA 872.0          2
37: 2015-12-23 07:18:50   NA        NA 872.0          2
38: 2015-12-23 07:19:00 1267       970 874.0          2
39: 2015-12-23 07:19:10   NA        NA 875.0          2
40: 2015-12-23 07:19:20   NA        NA 875.0          2
41: 2015-12-23 07:19:30   NA        NA 876.0          2
42: 2015-12-23 07:19:40   NA        NA 876.5          2
43: 2015-12-23 07:19:50   NA        NA 876.0          2
44: 2015-12-23 07:20:00   NA        NA 878.0          2
45: 2015-12-23 07:20:10   NA        NA 876.0          2
46: 2015-12-23 07:20:20   NA        NA 875.5          2
47: 2015-12-23 07:20:30   NA        NA 875.5          2
48: 2015-12-23 07:20:40   NA        NA 874.0          2
49: 2015-12-23 07:20:50   NA        NA 872.5          2
50: 2015-12-23 07:21:00   NA        NA 870.5          2
           Bottom.Start Dive Max.Depth Depth DiveNumber
Ben
  • 28,684
  • 5
  • 23
  • 45
  • Ben, thank you so much! This worked PERFECTLY :D Would you mind just explaining what the last two lines of code mean in laymen's terms? Especially what the ':=' and '.'s do? And why the 'x.Bottom.Start' in the last line? Would love to understand for future use :) – Meg.abytes Oct 08 '21 at 01:16
  • The `:=` is assignment by reference in `data.table`, without copying data. See [this](https://www.rdocumentation.org/packages/data.table/versions/1.14.0/topics/%3A%3D) for details. In the above case, I'm just adding 3 new columns (one for `DiveNumber` count, and the other 2 for start and end times)...as for `x.` added to `Bottom.Start`, this allows you to refer to the specific columns of the first data.table in the join, otherwise they are masked by the other one...the period `.` in `data.table` syntax is just `list`... – Ben Oct 08 '21 at 01:32
  • More details on the prefix of `x.` used in nonequi joins can be found [here](https://stackoverflow.com/a/44343424/3460670). – Ben Oct 08 '21 at 01:41
0

You may try using difftime lag and cumsum -

library(dplyr)

df %>%
  mutate(DiveNumber = cumsum(c(TRUE, difftime(Bottom.Start, 
                             lag(Bottom.Start), units = 'mins')[-1] > 2)))

#           Bottom.Start Dive Max.Depth Depth DiveNumber
# 1: 2015-12-22 01:07:00   NA        NA 311.0          1
# 2: 2015-12-22 01:07:10   NA        NA 308.5          1
# 3: 2015-12-22 01:07:20   NA        NA 307.0          1
# 4: 2015-12-22 01:07:30   NA        NA 306.5          1
# 5: 2015-12-22 01:07:40   NA        NA 305.5          1
# 6: 2015-12-22 01:07:50   NA        NA 308.5          1
# 7: 2015-12-22 01:08:00   NA        NA 318.5          1
# 8: 2015-12-22 01:08:10   NA        NA 331.0          1
#...
#...
#37: 2015-12-22 01:13:00   NA        NA 637.5          1
#38: 2015-12-22 01:13:10   NA        NA 655.0          1
#39: 2015-12-23 07:17:00   NA        NA 863.0          2
#40: 2015-12-23 07:17:10   NA        NA 863.5          2
#41: 2015-12-23 07:17:20   NA        NA 865.0          2
#42: 2015-12-23 07:17:30   NA        NA 866.0          2
#43: 2015-12-23 07:17:40   NA        NA 867.0          2
#...
#...
#74: 2015-12-23 07:22:50   NA        NA 907.0          2
#75: 2015-12-23 07:23:00   NA        NA 922.5          2

For your actual goal of 12 hours you may change units = 'mins' to units = 'hours' and > 2 to > 12.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213