4

All,

I'm looking for a reliable, unsupervised way to detect change points in a relatively short vector. Consider the following two examples:

v1 = c(0.299584,0.314446,0.357783,0.388896,0.410417,0.427182,0.450383,0.466671,0.474884,0.474749,0.493566,0.500374,0.522482,0.529851,0.538387,0.577901,0.610939,0.639383,0.662433,0.692656,0.720543,0.738255,0.748055,0.7591,0.770595,0.781811,0.794479,0.794588,0.789448,0.77667,0.765406,0.75152,0.740408,0.726898,0.720766,0.709445,0.69896,0.687508,0.673382,0.65795,0.639214,0.620445,0.590047,0.561773,0.526807,0.486848,0.439681,0.387545,0.313369,0.282872,0.279908,0.271836,0.269088,0.262727,0.259782)

v2 = c(0.081309,0.206263,0.429069,0.511859,0.565194,0.578792,0.56919,0.51985,0.432563,0.193907,0.0771,0.086603,0.18303,0.177608,0.169706,0.260917,0.292062,0.2979,0.263249,0.270576,0.250422,0.25219,0.182878,0.080623,0.079443,0.088944,0.087623,0.126403,0.155563,0.273942,0.312054,0.370195,0.357087,0.336452,0.300574,0.243105,0.243105,0.25593,0.227401,0.218047,0.15857,0.157727,0.139801,0.125742,0.129142,0.142166,0.142166,0.136748,0.107755,0.064377,0.072801,0.060093,0.103441,0.111704,0.124544)

If you look at

plot(v1,type='l') 

and

plot(v2,type='l')

you can see that for v1 I'd like to detect a change around index = 28, and for v2 I'd like to detect changes at the index values of 8, 11, 18, 25, 32, and 51. So far I've experimented with the Bayesian Change Point algorithm, which works OK in terms of identifying where inflection points are likely (low posterior probability regions), but still forces me to rely on visual inspection for the final determination:

install.packages('bcp')
library(bcp)

test = bcp(v1,w0=0.2,p0=0.01)
plot(v1,type='l')
par(new=TRUE)
plot(test$posterior.prob,type='l',col=2)

test = bcp(v2,w0=0.2,p0=0.01)
plot(v2,type='l')
par(new=TRUE)
plot(test$posterior.prob,type='l',col=2)

Is there a way to automate an unsupervised selection of estimates of multiple change points in this kind of data? Maybe I'm just futilely searching for a replacement for human intuition :P I also looked at the changepoint package, but it doesn't seem to be designed for this kind of data.

Thanks, Aaron

aaron
  • 6,339
  • 12
  • 54
  • 80
  • Have you seen the questions about peak detection or local minima/maxima on s.o. - e.g. http://stackoverflow.com/questions/6836409/finding-local-maxima-and-minima ? – thelatemail Sep 05 '13 at 23:55
  • I like using `pastecs::turnpoints` , with or without presmoothing, depending on the quality of the input data. – Carl Witthoft Sep 06 '13 at 11:51

1 Answers1

5

So, this is a simple solution. You can modify the parameters to give you back different (more/fewer, sensitive/insensitive) inflection points (or areas, in the case of your data).

plot(v2, type="l", col="darkblue", lwd=2)
# v2 <- smooth(v2, kind="3")  # optional
lines(v2, lwd=1, col="red")
d2 <- diff(v2)
d2 <- d2>0
d2 <- d2*2 -1 
k <- 5
cutoff <- 10
scores <- sapply(k:(length(d2)-k), FUN=function(i){
  score <- abs(mean(-d2[ i-1:k ], na.rm=T) + mean(d2[ i+0:k ], na.rm=T))
})


scores <- sapply(k:(length(v2)-k), FUN=function(i){
  left <- (v2[sapply(i-1:k, max, 1) ]<v2[i])*2-1
  right <- (v2[sapply(i+1:k, min, length(v2)) ]<v2[i])*2-1

  score <- abs(sum(left) + sum(right))
})

inflections <- (k:(length(v2)-k))[scores>=cutoff]

plot(v2, type="l")
abline(v=inflections, col="red", lwd=3)
print(inflections) #  6 11 18 25 32 (missed 51, if you make cutoff=8 it'll catch it...)
Hillary Sanders
  • 5,778
  • 10
  • 33
  • 50
  • this is a really simple, elegant solution to this problem. thanks! – aaron Sep 07 '13 at 01:12
  • PS: is this in a paper somewhere, or just something you came up with on the fly – aaron Sep 07 '13 at 01:21
  • I just came up with it. – Hillary Sanders Sep 07 '13 at 01:43
  • It looks like your 2nd `sapply` overwrites the 1st one. Presumably the 1st one was an alternative version that only looks at unidirectional changes (e.g. only in the past, which can be useful for when you don't have the luxury of trimming valuable new observations). However in that case the 1st version of the `sapply` shouldn't have been dropping the first *and* last `k` observations. The only reason to do that is the syntax used for iterating the function, which could be re-written as something like `i:(k+i-1)` and `(i+k):(i+2*k-1)` instead of `i-1:k` and `i+0:k`. – Hack-R May 02 '21 at 20:39