I am attempting to find a solution that will allow me to subset numeric data by finding the start of a continuously increasing vector, and stop at the max.
Some example data:
if(!require(data.table)) {install.packages("data.table"); library(data.table)}
if(!require(zoo)) {install.packages("zoo"); library(zoo)}
if(!require(dplyr)) {install.packages("dplyr"); library(dplyr)}
depth <- c(1.1, 2, 1.6, 1.2, 1.6, 1.2, 1.5, 1.7, 2.1, 3.1, 3.8, 5.2, 6.1, 7.0, 6.9, 6.9, 6.9, 6.0, 4.3, 2.1, 2.0)
temp <- c(17.9, 17.9, 17.8, 17.9, 17.7, 17.9, 17.9, 17.8, 17.7, 17.6, 17.5, 17.3, 17.2, 17.1, 17.0, 16.9, 16.7, 16.9, 17.2, 17.5, 17.9)
testdf <- data.frame(depth = depth, temp = temp)
I have tried a few solutions, one does not work, the other works but I feel it may have limitations in certain situations.
Solution 1 only finds 1:max
. Similar solutions suggest removing any decreasing values, where diff
would be negative. These are not what I want.
setDT(testdf)[, .SD[1:which.max(depth)]]
depth temp
1: 1.1 17.9
2: 2.0 17.9
3: 1.6 17.8
4: 1.2 17.9
5: 1.6 17.7
6: 1.2 17.9
7: 1.5 17.9
8: 1.7 17.8
9: 2.1 17.7
10: 3.1 17.6
11: 3.8 17.5
12: 5.2 17.3
13: 6.1 17.2
14: 7.0 17.1
I am attempting to get this back:
depth temp
6: 1.2 17.9
7: 1.5 17.9
8: 1.7 17.8
9: 2.1 17.7
10: 3.1 17.6
11: 3.8 17.5
12: 5.2 17.3
13: 6.1 17.2
14: 7.0 17.1
Solution 2 uses diff
and a rollapply
to arbitrarily bin a number of rows (n = 10
here). In this specific use, I pad an extra row to the max index, and to get that, have to set diff
to 0
, otherwise the rollapply
stops well below the max.
testdf$diff <- c(diff(testdf$depth), NA) # add diff column and NA to empty cell
testdf <- testdf[1:(which(testdf$depth == max(testdf$depth)) + 1),] # subset to max depth row, plus one
testdf$diff[(which(testdf$depth == max(testdf$depth))) : (which(testdf$depth == max(testdf$depth)) + 1)] <- 0 # set any diff entry after max depth to 0, for rollapply to work
testdf <- testdf %>%
mutate(diff = rollapply(diff, width = 10, min, align = "left", fill = 0, na.rm = TRUE)) %>%
filter(diff >= 0)
Returns what I want:
depth temp diff
1 1.2 17.9 0
2 1.5 17.9 0
3 1.7 17.8 0
4 2.1 17.7 0
5 3.1 17.6 0
6 3.8 17.5 0
7 5.2 17.3 0
8 6.1 17.2 0
9 7.0 17.1 0
10 6.9 17.0 0 # an extra padded row
This solution may not work all the time, using an arbitrary window. It seems like the ideal solution would just find the max index, then go up to the last positive diff
value, and subset that range, but I'm trying to figure out a way that doesn't involve looping.
edit
A while
loop works, but I was trying to avoid a loop.
findmindepth <- function(x) {
maxdi <- NA
mindi <- NA
maxdi <- (which(x$depth == max(x$depth)) - 1)
while(x$diff[maxdi] > 0) {
maxdi = maxdi - 1
}
mindi = maxdi + 1
newx <- x[mindi:(which(x$depth == max(x$depth)) + 1),]
}