20

I have data that always looks something like this:

alt text http://michaelfogleman.com/static/images/chart.png

I need an algorithm to locate the three peaks.

The x-axis is actually a camera position and the y-axis is a measure of image focus/contrast at that position. There are features at three different distances that can be in focus and I need to determine the x-values for these three points.

The middle hump is always a little harder to pick out, even for a human.

I have a homemade algorithm that mostly works, but I'm wondering if there's a standard way to grab local maxima from a function that can have a little noise in it. The peaks overcome the noise easily though.

Also, being camera data, an algorithm that doesn't require scanning the full range could be useful.

Edit: Posting the Python code that I ended up using. It uses my original code that finds maxima given a search threshold and does a binary search to find a threshold that results in the desired number of maxima.

Edit: Sample data included in code below. New code is O(n) instead of O(n^2).

def find_n_maxima(data, count):
    low = 0
    high = max(data) - min(data)
    for iteration in xrange(100): # max iterations
        mid = low + (high - low) / 2.0
        maxima = find_maxima(data, mid)
        if len(maxima) == count:
            return maxima
        elif len(maxima) < count: # threshold too high
            high = mid
        else: # threshold too low
            low = mid
    return None # failed

def find_maxima(data, threshold):
    def search(data, threshold, index, forward):
        max_index = index
        max_value = data[index]
        if forward:
            path = xrange(index + 1, len(data))
        else:
            path = xrange(index - 1, -1, -1)
        for i in path:
            if data[i] > max_value:
                max_index = i
                max_value = data[i]
            elif max_value - data[i] > threshold:
                break
        return max_index, i
    # forward pass
    forward = set()
    index = 0
    while index < len(data) - 1:
        maximum, index = search(data, threshold, index, True)
        forward.add(maximum)
        index += 1
    # reverse pass
    reverse = set()
    index = len(data) - 1
    while index > 0:
        maximum, index = search(data, threshold, index, False)
        reverse.add(maximum)
        index -= 1
    return sorted(forward & reverse)

data = [
    1263.900, 1271.968, 1276.151, 1282.254, 1287.156, 1296.513,
    1298.799, 1304.725, 1309.996, 1314.484, 1321.759, 1323.988,
    1331.923, 1336.100, 1340.007, 1340.548, 1343.124, 1353.717,
    1359.175, 1364.638, 1364.548, 1357.525, 1362.012, 1367.190,
    1367.852, 1376.275, 1374.726, 1374.260, 1392.284, 1382.035,
    1399.418, 1401.785, 1400.353, 1418.418, 1420.401, 1423.711,
    1425.214, 1436.231, 1431.356, 1435.665, 1445.239, 1438.701,
    1441.988, 1448.930, 1455.066, 1455.047, 1456.652, 1456.771,
    1459.191, 1473.207, 1465.788, 1488.785, 1491.422, 1492.827,
    1498.112, 1498.855, 1505.426, 1514.587, 1512.174, 1525.244,
    1532.235, 1543.360, 1543.985, 1548.323, 1552.478, 1576.477,
    1589.333, 1610.769, 1623.852, 1634.618, 1662.585, 1704.127,
    1758.718, 1807.490, 1852.097, 1969.540, 2243.820, 2354.224,
    2881.420, 2818.216, 2552.177, 2355.270, 2033.465, 1965.328,
    1824.853, 1831.997, 1779.384, 1764.789, 1704.507, 1683.615,
    1652.712, 1646.422, 1620.593, 1620.235, 1613.024, 1607.675,
    1604.015, 1574.567, 1587.718, 1584.822, 1588.432, 1593.377,
    1590.533, 1601.445, 1667.327, 1739.034, 1915.442, 2128.835,
    2147.193, 1970.836, 1755.509, 1653.258, 1613.284, 1558.576,
    1552.720, 1541.606, 1516.091, 1503.747, 1488.797, 1492.021,
    1466.720, 1457.120, 1462.485, 1451.347, 1453.224, 1440.477,
    1438.634, 1444.571, 1428.962, 1431.486, 1421.721, 1421.367,
    1403.461, 1415.482, 1405.318, 1399.041, 1399.306, 1390.486,
    1396.746, 1386.178, 1376.941, 1369.880, 1359.294, 1358.123,
    1353.398, 1345.121, 1338.808, 1330.982, 1324.264, 1322.147,
    1321.098, 1313.729, 1310.168, 1304.218, 1293.445, 1285.296,
    1281.882, 1280.444, 1274.795, 1271.765, 1266.857, 1260.161,
    1254.380, 1247.886, 1250.585, 1246.901, 1245.061, 1238.658,
    1235.497, 1231.393, 1226.241, 1223.136, 1218.232, 1219.658,
    1222.149, 1216.385, 1214.313, 1211.167, 1208.203, 1206.178,
    1206.139, 1202.020, 1205.854, 1206.720, 1204.005, 1205.308,
    1199.405, 1198.023, 1196.419, 1194.532, 1194.543, 1193.482,
    1197.279, 1196.998, 1194.489, 1189.537, 1188.338, 1184.860,
    1184.633, 1184.930, 1182.631, 1187.617, 1179.873, 1171.960,
    1170.831, 1167.442, 1177.138, 1166.485, 1164.465, 1161.374,
    1167.185, 1174.334, 1186.339, 1202.136, 1234.999, 1283.328,
    1347.111, 1679.050, 1927.083, 1860.902, 1602.791, 1350.454,
    1274.236, 1207.727, 1169.078, 1138.025, 1117.319, 1109.169,
    1080.018, 1073.837, 1059.876, 1050.209, 1050.859, 1035.003,
    1029.214, 1024.602, 1017.932, 1006.911, 1010.722, 1005.582,
    1000.332, 998.0721, 992.7311, 992.6507, 981.0430, 969.9936,
    972.8696, 967.9463, 970.1519, 957.1309, 959.6917, 958.0536,
    954.6357, 954.9951, 947.8299, 953.3991, 949.2725, 948.9012,
    939.8549, 940.1641, 942.9881, 938.4526, 937.9550, 929.6279,
    935.5402, 921.5773, 933.6365, 918.7065, 922.5849, 939.6088,
    911.3251, 923.7205, 924.8227, 911.3192, 936.7066, 915.2046,
    919.0274, 915.0533, 910.9783, 913.6773, 916.6287, 907.9267,
    908.0421, 908.7398, 911.8401, 914.5696, 912.0115, 919.4418,
    917.0436, 920.5495, 917.6138, 907.5037, 908.5145, 919.5846,
    917.6047, 926.8447, 910.6347, 912.8305, 907.7085, 911.6889,
]

for n in xrange(1, 6):
    print 'Looking for %d maxima:' % n
    indexes = find_n_maxima(data, n)
    print indexes
    print ', '.join(str(data[i]) for i in indexes)
    print

Output:

Looking for 1 maxima:
[78]
2881.42

Looking for 2 maxima:
[78, 218]
2881.42, 1927.083

Looking for 3 maxima:
[78, 108, 218]
2881.42, 2147.193, 1927.083

Looking for 4 maxima:
[78, 108, 218, 274]
2881.42, 2147.193, 1927.083, 936.7066

Looking for 5 maxima:
[78, 108, 218, 269, 274]
2881.42, 2147.193, 1927.083, 939.6088, 936.7066
FogleBird
  • 74,300
  • 25
  • 125
  • 131

10 Answers10

9

The local maxima would be any x point which has a higher y value than either of its left and right neighbors. To eliminate noise, you could put in some kind of tolerance threshold (ex. x point must have higher y value than n of its neighbors).

To avoid scanning every point, you could use the same approach but go 5 or 10 points at a time to get a rough sense of where the maximum lie. Then come back to those areas for a more detailed scan.

btreat
  • 1,554
  • 8
  • 10
  • This is sort of how my current algorithm works. The problem is that I have to know what threshold to give it, and that threshold could change depending on the actual data. – FogleBird Jul 14 '10 at 02:18
  • 2
    @FogleBird You could find that threshold by going through the data once and computing the mean/std dev of the differences between adjacent y-values; and let your threshold be some function of that. – quantumSoup Jul 14 '10 at 02:21
  • @Aircule: Good idea, I'll try that tomorrow. – FogleBird Jul 14 '10 at 02:30
  • 2
    'The local maxima would be any x point which has a higher y value than either of its left and right neighbors' Not necessarily this simple, right? The presence of equal values will break this. If your data points are 1, 2, 3, 50, 3, 3, 10, 10, 3, then the 10s represent a local maxima, though neither of them is higher than both left + right neighbours. – Cowan Jul 14 '10 at 07:17
  • To handle maxima that might occur in flat spots, re-word "any x point which has a higher y value than its neighbors" to "any x point which *does not* have a *lower* y value than its neighbors". – bta Jul 14 '10 at 20:26
  • Hi, I would like to know if there is a name in computer science or mathematics for this method. – Lijia Yu Oct 06 '22 at 13:33
9

Couldn't you move along the graph, regularly calculating the derivative and if it switches from positive to negative you've found a peak?

Matt Mitchell
  • 40,943
  • 35
  • 118
  • 185
  • The derivative will fluctuate +/- too much due to the random noise. How do I eliminate that problem? – FogleBird Jul 14 '10 at 02:24
  • Basically averaging with filters as suggested I would guess. – Matt Mitchell Jul 14 '10 at 02:55
  • 4
    If that happens, the noise is creating local maxima. Averaging or filtering away the noise is one way to decide which maxima are 'important' enough. However, there's no need to calculate a derivative here. Just compare the value of the point to its neighbors. – Larry Wang Jul 14 '10 at 04:00
  • are there really no already implemented algorithms to do this, perhaps using a smarter way? –  Jul 26 '21 at 12:48
3

You could try to fit a spline to the data and then find the extrema of the spline. Since the spline is piecewise polynomial, the exact locations of the extrema can be found with relatively simple formulas.

Victor Liu
  • 3,545
  • 2
  • 24
  • 37
  • I think fitting the data in a spline would be more computationally intensive than going through the data "manually." – quantumSoup Jul 14 '10 at 02:23
  • Fitting splines is very fast -- O(n). – John D. Cook Jul 14 '10 at 02:40
  • I also think fitting would be the most accurate solution; though I would not use spline. In this case I'd try a sum of 3 Gauss-bells. – Curd Jul 14 '10 at 20:13
  • I do ultimately use curve fitting to find the actual peaks. But I do that with a higher resolution scan around each maxima, so then I just do a simple parabolic fit. I use the algorithm being discussed here just to figure out roughly where the peaks are during a more coarse grained (faster) focus scan. – FogleBird Jul 15 '10 at 00:07
3

I practice, I've found what works well is to use a dilation morphology operation to produce a dilated version of your sampled function (data points) then to identify local max's compare the dilated version vs. the original and anywhere where the dilated version equals the original version should be a local maximum. This works really well I find with 2D+ data (i.e. images) but since you have 1D data it may be easier just to use the differences between successive points as an approximation to the derivative.

Note if you do use the dilation technique, the structuring element (both the size and shape) that you use in the dilation greatly determines the types of peaks you are looking for.

Also if you have noise in your data, smooth it with a low pass filter, like a 1D Gaussian before searching.

More info on dilation can be found here:

here is the idea implemented in matlab: http://www.mathworks.com/matlabcentral/fileexchange/14498-local-maxima-minima

if you don't know what dilation is: http://en.wikipedia.org/wiki/Dilation_%28morphology%29

(its dead simple once you understand it here is a really easy explanation) http://homepages.inf.ed.ac.uk/rbf/HIPR2/dilate.htm

ldog
  • 11,707
  • 10
  • 54
  • 70
3

direct approach, something like this:

#include <stdio.h>
#include <stdlib.h>

#define MAXN 100

double smooth(double arr[], int n, int i)
{
        double l,r,smoo;
        l = (i - 1 < 0)?arr[0]:arr[i-1];
        r = (i + 1 >= n)?arr[n-1]:arr[i+1];
        smoo = (l + 2*arr[i] + r)/4;
        return smoo;
}

void findmax(double arr[], int n)
{
        double prev = arr[0];
        int i;
        int goup = 1;

        for(i = 0; i < n; i++)
        {
                double cur = smooth(arr,n,i);
                if(goup) {
                        if(prev > cur && i > 0) {
                                printf("max at %d = %lf\n", i-1, arr[i-1]);
                                goup = 0;
                        }
                } else {
                        if(prev < cur)
                                goup = 1;
                }
                prev = cur;
        }
}

int main()
{
        double arr[MAXN] = {0,0,1,2,3,4,4,3,2,2,3,4,6,8,7,8,6,3,1,0};
        int n = 20, i;

        for(i = 0; i < n; i++)
                printf("%.1lf ",arr[i]);
        printf("\n");

        findmax(arr,n);
        return 0;
}

output:

0.0 0.0 1.0 2.0 3.0 4.0 4.0 3.0 2.0 2.0 3.0 4.0 6.0 8.0 7.0 8.0 6.0 3.0 1.0 0.0
max at 6 = 4.000000
max at 14 = 7.000000

1) set state = goup: going upward the curve;
2) if previuos value is greater than current then there was maximum:
print it and set state to godown
3) while in godown state wait until previous is less then current and switch to (1)

to reduce noise use some smoothing function smooth()

Oleg Razgulyaev
  • 5,757
  • 4
  • 28
  • 28
3

Another method is to create what i call a walking slope average. I dont know if there is a name for it but it goes like this, your data set is for example 1000 numbers, you take x[n] +x[n..1234567] numbers say 7 numbers ahaed, take average of first 3 and last 3 use them to find if a line put over them would go up or down.

When it goes down you passed a mountain peek number, after one such sample keep waiting till it raises again. So only after upwards you track the first moment of going down. And repeat that.

It will detect tops, and depending on slope line length (7) .. 15 ..33 etc, you also remove noise.

Peter
  • 2,043
  • 1
  • 21
  • 45
2

Do you know the derivative of the data? If yes and you can symbolically solve the system then you can find the points where the derivative is equal to zero.

If you don't have the formula (which seems to be the case from your OP) then you might want to try filter out some noise then do the following:

If you can't solve symbolically then you can use something like Newton–Raphson method to get to the local maxima and choose the starting points randomly from the range to try to capture all the maxima.

If you don't have the derivative data then you might want to try a hill climbing algorithm that doesn't require the derivative and start it at multiple different randomly chosen points. You could then keep track of the points that you find when the iterative hill climbing part of the algorithm terminates. This will only probabilistically find all the local maxima but it may be good enough for your purposes.

EDIT: given that the 3 peaks will be roughly in the same places you should try guarantee that the starting point for these algorithms is near those points for at least some of the times you run the iterative algorithm.

shuttle87
  • 15,466
  • 11
  • 77
  • 106
2

You could try a band-pass filter to reject the noise and make it easier to reliably select those maxima.

The point of a band-pass (rather than low-pass) is to pull nearly-constant down to zero. That way, the highest values you find in the filtered result will probably be the clearest peaks.

Certainly if you can define a narrow frequency-range for your signal and apply a very selective filter, it should make a fairly unsophisticated maxima-finding algorithm practical - e.g. a simple sample-thats-higher-than-either-neighbour scan.

A sophisticated filter might not be necessary - you could get away with a mexican hat wavelet at a single well-chosen scale. One scale probably means it's not really a wavelet any more - just a band-pass FIR filter.

EDIT

There's an asymmetric wavelet (I forget the name) which, if the mexican hat is analogous to a cosine, takes the role of the sine. I mention it as it combines band-pass filtering with a kind of derivative - the zero-crossings in the result are the stationary-points of a band-pass filtered version of the original signal.

A "debounced" scan could then identify reliable maxima by looking for crossing points in this "derivative" signal.

2

As others have mentioned derivatives or comparing to local neighbors usually works for me. If you're worried about noise I can recommend median filtration as a pretty fast and reliable filtration scheme. I use it and reverse median filtration all the time to squelch noise in acoustic sensors, works great.

ChrisC
  • 1,282
  • 9
  • 9
0

I think I had a similar problem to solve compound interest formulas. I think if you there is not an inflection point between to points, you can try to get the local maximun or minimun just iterating.

If you dont know how the function is, I think there if no solution but derivate it.

My solution divide the segment in 4 equal parts (5 points) and compare groups of 3 consecutive points to choose a new little segment. Maybe it is some beter implementation.

//I know always f(-1)<f(1) and no inflection point between -1 and 1.


  function maximo(capital_inicial,periodos, aportacion){
  var min=-1;var max=1;var c=[];
  c[0]=capitalfinal(capital_inicial,periodos,min,aportacion);
  c[2]=capitalfinal(capital_inicial,periodos,(min+max)/2,aportacion);
  c[4]=capitalfinal(capital_inicial,periodos,max,aportacion);
  while(max-min>0.0001){
    c[1]=capitalfinal(capital_inicial,periodos,(3*min+max)/4,aportacion);
    if(segmento(c[0],c[1],c[2])==-1){
      c[3]=capitalfinal(capital_inicial,periodos,(min+max*3)/4,aportacion);
      if(segmento(c[1],c[2],c[3])==-1){
        min=(min+max)/2;
        c[0]=c[2];
        c[2]=c[3];
      }else{
        min=(3*min+max)/4;
        max=(min+max*3)/4
        c[0]=c[1];
        c[4]=c[3];
      }
    }else{
      max=(min+max)/2;
      c[4]=c[2];
      c[2]=c[1];
    }
  }
  return (max+min)/2;
}
function segmento(a,b,c){
  return a>b && b>c ? -1 : a<b && b<c ? 1:0;
}

https://docs.google.com/spreadsheets/d/1crPdZfsOSbkfqya9C7Ezg-BZcVZzf46_avX-NVPsaqw/edit#gid=2021465223