0

I am attempting to find a solution that will allow me to subset numeric data by finding the start of a continuously increasing vector, and stop at the max.

Some example data:

if(!require(data.table)) {install.packages("data.table"); library(data.table)}
if(!require(zoo)) {install.packages("zoo"); library(zoo)}
if(!require(dplyr)) {install.packages("dplyr"); library(dplyr)}

depth <- c(1.1, 2, 1.6, 1.2, 1.6, 1.2, 1.5, 1.7, 2.1, 3.1, 3.8, 5.2, 6.1, 7.0, 6.9, 6.9, 6.9, 6.0, 4.3, 2.1, 2.0)
temp <- c(17.9, 17.9, 17.8, 17.9, 17.7, 17.9, 17.9, 17.8, 17.7, 17.6, 17.5, 17.3, 17.2, 17.1, 17.0, 16.9, 16.7, 16.9, 17.2, 17.5, 17.9)
testdf <- data.frame(depth = depth, temp = temp)

I have tried a few solutions, one does not work, the other works but I feel it may have limitations in certain situations.

Solution 1 only finds 1:max. Similar solutions suggest removing any decreasing values, where diff would be negative. These are not what I want.

setDT(testdf)[, .SD[1:which.max(depth)]]
    depth temp
 1:   1.1 17.9
 2:   2.0 17.9
 3:   1.6 17.8
 4:   1.2 17.9
 5:   1.6 17.7
 6:   1.2 17.9
 7:   1.5 17.9
 8:   1.7 17.8
 9:   2.1 17.7
10:   3.1 17.6
11:   3.8 17.5
12:   5.2 17.3
13:   6.1 17.2
14:   7.0 17.1

I am attempting to get this back:

    depth temp
 6:   1.2 17.9
 7:   1.5 17.9
 8:   1.7 17.8
 9:   2.1 17.7
10:   3.1 17.6
11:   3.8 17.5
12:   5.2 17.3
13:   6.1 17.2
14:   7.0 17.1

Solution 2 uses diff and a rollapply to arbitrarily bin a number of rows (n = 10 here). In this specific use, I pad an extra row to the max index, and to get that, have to set diff to 0, otherwise the rollapply stops well below the max.

testdf$diff <- c(diff(testdf$depth), NA) # add diff column and NA to empty cell
testdf <- testdf[1:(which(testdf$depth == max(testdf$depth)) + 1),] # subset to max depth row, plus one
testdf$diff[(which(testdf$depth == max(testdf$depth))) : (which(testdf$depth == max(testdf$depth)) + 1)] <- 0 # set any diff entry after max depth to 0, for rollapply to work

testdf <- testdf %>% 
mutate(diff = rollapply(diff, width = 10, min, align = "left", fill = 0, na.rm = TRUE)) %>% 
filter(diff >= 0)

Returns what I want:

   depth temp diff
1    1.2 17.9    0
2    1.5 17.9    0
3    1.7 17.8    0
4    2.1 17.7    0
5    3.1 17.6    0
6    3.8 17.5    0
7    5.2 17.3    0
8    6.1 17.2    0
9    7.0 17.1    0
10   6.9 17.0    0 # an extra padded row

This solution may not work all the time, using an arbitrary window. It seems like the ideal solution would just find the max index, then go up to the last positive diff value, and subset that range, but I'm trying to figure out a way that doesn't involve looping.

edit

A while loop works, but I was trying to avoid a loop.

findmindepth <- function(x) {
  maxdi <- NA
  mindi <- NA
  maxdi <- (which(x$depth == max(x$depth)) - 1)
  while(x$diff[maxdi] > 0) {
    maxdi = maxdi - 1
  }
  mindi = maxdi + 1
  newx <- x[mindi:(which(x$depth == max(x$depth)) + 1),]
}
Anonymous coward
  • 2,061
  • 1
  • 16
  • 29

1 Answers1

1

You can use run-length encoding with diff to find all decreasing/increasing start/end points:

which_max <- which.max(testdf$depth)
encoding <- rle(diff(testdf$depth) > 0)

# these contain the start/end indices of all continuously increasing/decreasing subsets
ends <- cumsum(encoding$lengths) + 1L
starts <- ends - encoding$lengths

# filter out the decreasing subsets
starts <- starts[encoding$values]
ends <- ends[encoding$values]

# find the one that contains the maximum
interval <- which(starts <= which_max & ends >= which_max)
out <- testdf[starts[interval]:ends[interval],]
out
   depth temp
6    1.2 17.9
7    1.5 17.9
8    1.7 17.8
9    2.1 17.7
10   3.1 17.6
11   3.8 17.5
12   5.2 17.3
13   6.1 17.2
14   7.0 17.1

EDIT: actually, if you only care about the subset that contains the maximum, you can do something simpler:

which_max <- which.max(testdf$depth)
if (which_max == 1L) {
  out <- testdf[1L, , drop = FALSE]
}
else {
  subset1 <- testdf$depth[which_max:1L]
  len <- which.max(diff(subset1) > 0)
  out <- testdf[(which_max - len + 1L):which_max,]
}
Alexis
  • 4,950
  • 1
  • 18
  • 37
  • Thank you, those are novel ways to do it. Haven't seen `rle` used, and hadn't thought of using `which.max` that way. You have a typo, the `else` needs to move up a line. – Anonymous coward Jun 28 '18 at 21:22
  • If you copy the code into the console, yes, but if you put it in a function, then it shouldn't matter. – Alexis Jun 28 '18 at 21:24