9

This question is a continuation of this one.

My goal is to find the turning points in stock price data.

So far I:

Tried differentiating the smoothed price set, with the help of Dr. Andrew Burnett-Thompson using the centered five-point method, as explained here.

I use the EMA20 of tick data for smoothing the data set.

For each point on the chart I get the 1st derivative (dy/dx). I create a second chart for the turning points. Each time the dy/dx is between [-some_small_value] and [+some_small_value] - I add a point to this chart.

The problems are: I don't get the real turning points, I get something close. I get too much or too little points - depening on [some_small_value]

I tried a second method of adding a point when dy/dx turns from negative to positive, which also creates too many points, maybe because I use EMA of tick data (and not of 1 minute closing price)

A third method is to divide the data set into slices of n points, and to find the minimum and maximum points. This works fine (not ideal), but it's lagging.

Anyone has a better method?

I attached 2 pictures of the output (1st derivative and n points min/max)

enter image description here enter image description here

Community
  • 1
  • 1
Yaron
  • 1,540
  • 3
  • 19
  • 33
  • 1
    Why is this tagged "graph-algorithm"? – harold Jan 11 '12 at 13:50
  • @harold My guess is that he wants an algorithm, and the input data can be graphed (see above). ;D On a more serious note, this is clearly not a graph algorithm. – Patrick87 Jan 11 '12 at 15:44
  • tag removed, now do you have an idea how to solve this? :D thanks – Yaron Jan 11 '12 at 18:22
  • Do you use the correct definition of "turning point"? In calculus this is more frequently used for points where a function turns from convex to concave or vice versa, that is, for sign changes in the second derivative. – Lutz Lehmann Sep 07 '19 at 09:01

4 Answers4

4

You could take the second derivative into account, meaning you should additionally (to your first derivative) evaluate (y_{i-1} + y_{i+1} - 2y_i) / (dx)². If this is above a certain threshold you have a maximum, if it is below you have a minimum and else you can discard it. This should throw out a lot of points that you keep using your method of finding extrema (y' = 0), because this condition is also valid for saddle points.

filmor
  • 30,840
  • 6
  • 50
  • 48
1

That's works Patrick87, Thanks. Following are java function to implement the same:

Assume StockPrices has a map of key date and value StockPrice (price, average where x = 5)

private double getCx(StockPrices stockPrices, LocalDate executionDate, int x, double m) { return Math.abs(getFx(stockPrices, executionDate) - getGx(stockPrices, executionDate)) - m * getHx(stockPrices, executionDate, x); }

private double getGx(StockPrices stockPrices, LocalDate executionDate) {
    return stockPrices.getAvg(executionDate, 5);
}

private double getFx(StockPrices stockPrices, LocalDate executionDate) {
    return stockPrices.getPrice(executionDate);
}

public double getHx(StockPrices stockPrice, LocalDate localDate, int x) {
    //standard deviation
    return Math.sqrt(getVariance(stockPrice, localDate, x));
}

private double getVariance(StockPrices stockPrice, LocalDate localDate, int x) {
    double sum = 0;
    int count = 0;
    for (int i = - (x / 2); i <= (x / 2) ; i++) {
        LocalDate date = localDate.with(BusinessDay.add(localDate, i, stockPrice.getPriceMap(), 2));
        double avg = stockPrice.getAvg(date, 5);
        double price = stockPrice.getPrice(date);
        if (price != 0.0) {
            sum += Math.pow((price - avg), 2);
            count++;
        }
    }
    return sum / count;
}
1

Here's just an idea, sort of an idea from a different angle, and possibly a very bad idea, but since differentiation isn't working, something like this might be a thought.

First, you need to determine a minimum meaningful X-axis interval. In your figure, if you take this to be too small, you will get false positives from the bumps. This is conceptually similar to the idea of smoothing your data. Call this interval dx.

Next, using a sliding window of size dx, generate a moving average curve corresponding to your curve. There are lots of different ways you could think about doing this (to remove statistical outliers, or to use more or fewer points in the window). Call this curve g(x), and your original curve f(x). Additionally, make a curve h(x) which gives some measure of the variability of data in the sliding window which you use to compute g(x) (standard deviation should work fine if you're using a few points from the interval).

Now, begin computing curves of the form c_m(x) = |f(x) - g(x)| - m * h(x). You can start with m = 1. Any points x for which c_m(x) is positive are candidates for a local min/max. Depending on how many hits you get, you can begin increasing or decreasing m. You can do this in a way similar to binarys search: if you want more points, make m = (min + m) / 2, and if you want fewer points, make m = (max + m) / 2 (adjusting min and max accordingly).

So here's an example of what I'm suggesting. Let's say we have the following series:

f(x) = [  1,   2,   4,   3,   2,   3,   6,   7,   8,   7, 
          5,   4,   3,   2,   2,   3,   2,   3,   5,   8,   9]

We choose dx = 5. We construct g(x) by taking a simple average of the points around x:

g(x) = [2.3, 2.5, 2.4, 2.8, 3.6, 4.2, 5.2, 6.2, 6.6, 6.2, 
        5.4, 4.2, 3.2, 2.8, 2.4, 2.4, 3.0, 4.2, 5.4, 6.3, 7.3]

h(x) = [1.2, 1.1, 1.0, 0.7, 1.4, 2.4, 2.3, 1.7, 1.0, 1.5,
        1.9, 1.7, 1.2, 0.7, 0.5, 0.6, 1.1, 2.1, 2.7, 2.4, 1.7]

With m = 1 we get:

c(x) = [0.1, xxx, 0.6, xxx, 0.2, xxx, xxx, xxx, 0.4, xxx,
        xxx, xxx, xxx, 0.1, xxx, 0.0, xxx, xxx, xxx, xxx, 0.0]

This seems to have worked fairly well, actually. Feel free to share thoughts. Note that this might be more or less the equivalent of differentiation, given the mean value theorem.

Patrick87
  • 27,682
  • 3
  • 38
  • 73
  • Can you expand on how exactly you are going from f(x) to g(x)? f(x) and g(x) have the same number of data points, so I don't see how this could be a simple moving average? – Ivan Dec 12 '17 at 04:23
  • Ah, I see, it is a moving mean. – Ivan Dec 12 '17 at 04:28
  • @Patrick87 The last spike from 5 to 8, 9 has a c(x) = 0.0 meaning a flat from the last min? – Rock Aug 22 '19 at 12:38
  • @patrick87 Can I ask you few questions around the suggestion to help me understand here? Its tough to have conversation in these comments. – dchhetri May 28 '20 at 22:54
  • @Ivan How is moving average different from moving mean? I'm still trying to figure out why g(0) = 2.3. I was expecting it to be (1+2+4+3+2)/5 = 2.4 – dchhetri May 28 '20 at 23:41
  • @dchhetri I am not aware of any distinction between moving average and moving mean. I might have miscalculated one of the data points. – Patrick87 May 29 '20 at 11:08
0

Another approach based on some of the ideas here. For each point in the series, look at n points before and after (the window). If the value of the current point is the highest in the window, make it a peak turning point (if it lowest, make it a trough). Exclude the first and final n points in the series.

Experimented with monthly data and got the following with n=6. enter image description here

Paul Evans
  • 1,436
  • 2
  • 10
  • 13