I have a large dataset (~300,000 rows) of fish detections. Each detection has a date, a station (location), and a tagID, among many other variables like temperature, depth, etc. I want to pull out the first and last detection for each station, every time the fish visits that station. The end goal is to compute residency time at each station before it moves, and again when it comes back.
Here is a small example of the data
tagID <- c("8272", "8272", "8272", "8272", "8272", "8272", "8272", "8272", "8272", "8272")
date <- c("2020-07-12", "2020-07-12", "2020-07-13", "2020-07-13", "2020-07-16", "2020-07-17", "2020-07-20", "2020-07-29", "2020-07-30", "2020-08-04")
station <- c("4", "4", "4", "5", "5", "6", "6", "6", "4", "4")
temp <- c("10", "9", "11", "12", "10", "12", "11", "12", "12", "9")
depth <- c("6.14", "34.2", "21", "23.5", "15.4", "54", "32.4", "23", "33.3", "32.7")
df <- data.frame(tagID, date, station, temp, depth)
with the dataframe looking like:
tagID date station temp depth
1 8272 2020-07-12 4 10 6.14
2 8272 2020-07-12 4 9 34.2
3 8272 2020-07-13 4 11 21
4 8272 2020-07-13 5 12 23.5
5 8272 2020-07-16 5 10 15.4
6 8272 2020-07-17 6 12 54
7 8272 2020-07-20 6 11 32.4
8 8272 2020-07-29 6 12 23
9 8272 2020-07-30 4 12 33.3
10 8272 2020-08-04 4 9 32.7
I would like to find an efficient way to go through all 300K rows and extract something like:
tagID date station temp depth
1 8272 2020-07-12 4 10 6.14
3 8272 2020-07-13 4 11 21
4 8272 2020-07-13 5 12 23.5
5 8272 2020-07-16 5 10 15.4
6 8272 2020-07-17 6 12 54
8 8272 2020-07-29 6 12 23
9 8272 2020-07-30 4 12 33.3
10 8272 2020-08-04 4 9 32.7
This shows the first and last detection while the fish was at station 4, and then the first and last detection again when the fish comes back to station 4 later in the season.
I've looked at questions like Select first and last row from grouped data and Select the first and last row by group in a data frame, and other similar questions, but none of those account for a 2nd (3rd, 4th, n... time) the group (in my case: station) appears in the data.
Please let me know if you can help. Thank you. (This is my first question on stack overflow, any tips for future questions are helpful)