1

I have a list with Y values. If I plot it I get this:

image of a plot

There are three dominant peaks visible.

Question: Is there a way, in Python, to find the n-th most dominant spikes in a data list and print their x-position (in the list)?

I need to take into account spikes with very small distance to each other. For example the first left big spike is actually a double spike (it is the sodium double line from a daylight spectra).

Paul G.
  • 461
  • 6
  • 21
  • 1
    Can you elaborate a little more? In which form is your data present? Usually what you said should be no problem. You can just sort the lists to find out where the n-th largest is. If you elaborate a bit further, the second task should also be easy – Banana Dec 28 '17 at 12:48
  • What exactly defines a "peak"? In other words - what is the baseline? This is highly dependent on your data set. Examples for algorithms [are for instance compared here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2631518/). – Mr. T Dec 28 '17 at 12:55
  • The data consists of simple values between 0 and 65536 (y axis). I read them via stdin into a python script (data = sys.stdin.readlines()). Then I plot it. – Paul G. Dec 28 '17 at 12:57
  • Can you elaborate more what the outcome should be? Do you want to distinguish the double spike of the sodium double line or do you want them to count as separate maxima? Also: Do you have a list for x-values or are your y-values just numbered by 1 to n? – Banana Dec 28 '17 at 12:58
  • I think what a peak defines should be somehow variable... I would like to imagine a horizontal line at y=3000 or so and everything above should could be a peak – Paul G. Dec 28 '17 at 12:59
  • the outcome should be just the exact x-position for the spikes like for the first (double spike) something like peak1 @ x=105,y=5010 and peak2 @ x=110,y=4900 – Paul G. Dec 28 '17 at 13:02
  • @Banana I have only y values. For example I have 700 y values I also count x from 1-700 – Paul G. Dec 28 '17 at 13:06
  • Why 3000? Will the threshold always be 3000 or is it a value, you guessed from looking at the picture? If so, how do you want to define this threshold? I would have expected 3 peaks - why doesn't the one at 320 count as a peak? – Mr. T Dec 28 '17 at 13:12
  • Some suggestions have been made [here](https://stackoverflow.com/q/4624970/8881141) – Mr. T Dec 28 '17 at 13:14
  • Because at 300 I would say I cannot clearly distinguish the spike from background noise. I really would like to get only the 3 to 6th or so mosz dominant spikes. – Paul G. Dec 28 '17 at 13:16
  • @Piinthesky I think the local maxima are not directly relatable to here as the background is full of such – Banana Dec 28 '17 at 13:20
  • 1
    @PaulG. You can try to filter out the background first. Maybe afterwards you can try Piintheskys suggestion. If you are d'accord with setting the threshold by hand, you can make a list of x-values as range(0,len(y_vals)), zip them and then apply list(filter(lambda v:v[1]>threshhold),ziplst). That way you'll be left with a list of tuples that has information about the maxima and their original position. As a dummy solution, you could walk through the resulting list and mark when theres a value decline, i.e. a local maximum. You can distinguish "general peaks" by looking at the x-values closely – Banana Dec 28 '17 at 13:24
  • 1
    @Banana The suggestions include for instance [scipy's find-peaks algorithm](https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.signal.find_peaks_cwt.html) – Mr. T Dec 28 '17 at 13:27
  • I don't know, how representative the example is. But from looking at it, I would define the baseline as "zero +/- minimum of curve" and then find the maximum of each peak exceeding the baseline. – Mr. T Dec 28 '17 at 13:30

2 Answers2

1

EDIT

I deleted everything. Use this two new methods:

def FindMax( listOfY , indexes, numOfMax):
    'This method finds the maximum values in the list of peaks'
    listMax = []
    xList = []
    reconstructedList = []
    for c in range(0,numOfMax):
        listMax.append(max(listOfY))
        index = listOfY.index(max(listOfY))
        xList.append(indexes[index])
        listOfY.pop(index)
    return listMax, xList

def FindPeaks(listY):
    'This method finds the peaks from the list with the Y values'
    peaks = []
    indexes = []
    count = 0
    m2 = 0 #Old slope: starts with 0
    for value in range(1,len(listY)):
        m1 = listY[value] - listY[value-1] #New slope
        if( m2 > 0 and m1 < 0 ):
            peaks.append(listY[value-1])
            indexes.append( value-1 )
        m2 = m1 #Old slope is assigned
    return peaks, indexes

#Test
list = [1,3,55,5,76,26,77,88,4,96,1,5,2,7,3,100,100,100,76,25]
peaksList = FindPeaks(list) #List: (['peaks'],['indexes'])
print ("List: " , list)
print ("Peaks list: ",peaksList[0])
print ("Peaks indexes: ",peaksList[1])
maxList = FindMax(peaksList[0],peaksList[1],3)  
print ("Max: ", maxList[0])
print ("Max indexes: ", maxList[1])
peaksList = FindPeaks(list)
print ("Peaks: ",peaksList)

FindPeaks() method will use your list with the Y values as argument and won't modify it, it also returns a 2D list, where the first index of that list is the peaks list, and second list their index in "list". After that, the peaksList[0], peaksList1, and the number of max peaks you want are passed as argument to the method FindMax(), returning a 2D list. This new list has in the index '0' a list containing the maximum peaks, in descending order, at the index '1' of peaksList are the indexes where you find them at List 'list', which is your inicial list.

Below the comment #Test you can find all the tests I ran. As you can see this method can't detect flat peaks, but you could pass as an argument to the method FindMax() the whole list, with all the arguments described above as well and you will get all the values in the flat peak, which will look as a sequence of the same value ( [...',100,100,100'...] ). But the method FindPeaks() will be usefull when finding all other peaks (not flat peaks as I described), so in the case you have a flat peak this method won't return it.

Output command-line window enter image description here

I hope you find this useful for you.

ccolin
  • 304
  • 2
  • 7
  • Your function is modifying `listY`, you should make a copy of it after entering the function or keep track of indices – Georgy Dec 28 '17 at 15:25
  • The output are 2 lists, the first list is an n amount of maximum numbers, the second list has all the indexes (at listY) where you find each max number respectively. Why would he want a copy of the listY when he has the original (listY)? – ccolin Dec 28 '17 at 15:37
  • Print your `listY` before running a function, run the function, and print `listY` one more time. You will see that some elements will disappear. I don't think OP would want that. – Georgy Dec 28 '17 at 15:47
  • 1
    Ok, I see. Let me see how to fix that because even using a temp list as argument the main list is modified. – ccolin Dec 28 '17 at 16:10
  • This method doesn't find peaks. It would give you the three highest values from the largest peak. Or if there is no peak three noise data points from the baseline. – Mr. T Dec 28 '17 at 17:26
  • @Piinthesky answer is updated, the only problem would be occasional flat peaks, which can be eliminated with another method – ccolin Dec 29 '17 at 04:54
  • @Georgy I couldn't find any way to make a copy like you said, probably if you store the data in a file and read the file whenever you need the data. Still, in my update the only list modified is the peaksList, if OP needs the data after processing that list he could use the method again, or simply store the data in a file first and then retrieve it as many times as he wants. If you come up with a better idea tell me so I can update answer. – ccolin Dec 29 '17 at 04:58
  • What counts as a peak, baseline, noise is highly context dependent. Unfortunately, the OP is not very clear about this. Please don't use `list` as a variable name, you overwrite a built-in symbol. – Mr. T Dec 29 '17 at 05:02
  • @Piinthesky he didn't say he has noise nor that he has to filter noise, so I worked from that. this method works if you use any amount of data, if noise is the problem he should design a filter. What I mean is that if his data is clear of noise this method should work. Still I dont understand how noise would affect the methods efficiency, do you have a reference so I can read about it? I will update the list name later, thanks. – ccolin Dec 29 '17 at 05:19
  • Try [0,2,0,0,-1,0,179,0,1,0,-1,0,278,0,-2,0,1,0,149,0,-1,0] to see the problem. Clearly we have only three peaks here, the rest is noise around zero. But your script will pick up any local maximum. Now look at the picture the OP has posted. – Mr. T Dec 29 '17 at 05:26
  • Sorry guys, I did not know that this would be a so delicate task to do it, I thought there would exist a simple function that returns my desired peaks. My intention was really just to get the exact x position of that spikes to identify and translate it to a calibrated wavelength and thus to an specific element in the spectrum. I will not stop you from developing any noise measuring algorithms and so on but thats far behind my capability to understand. But so far I can simply measure the peaks manualy by counting the pixels and deciding what spikes could be an absorption line or noise. – Paul G. Dec 29 '17 at 08:53
-1

I ran into the same problem when making a stats library which you can find on my github, but to answer your question; Make a dictionary with the X values as keys and the Y values as Values. Find the Key with the max value using:

max(dict, key=dict.get)

You can use this as an index of the dictionary to get the value of that key. What you want to do is initialize a new dictionary just in case other keys share the same max value:

new_dict = {max(dict, key=dict.get):dict[max(dict, key=dict.get)]}

Lastly, iterate through the dictionary by key, value pair, updating your new dictionary with any key, value pair where the value is equal to the value of your initial max key, value pair:

def mode(list):
    list = sorted(list)
    dict = {}

    '''iterates through list and gives key value pair
    key being the instance in the list, value being 
    number of times that instance appears in the list'''
    for i in list:
        #n will be the indicator of the frequency i shows up
        n = 0
        for e in list:
            if e == i:
                #n increases by one every time i shows up in the list again
                n+=1
        dict.update({i:n})

    '''initial list of max key, value pair. Will update if another key 
    shares same max value'''
    new_dict = {max(dict, key=dict.get):dict[max(dict, key=dict.get)]}

    for key, value in dict.iteritems():
    '''if value of instance == value of key, value pair in new_dict, it 
    shares the max value so add to new dictionary''''
        if value == new_dict.get(new_dict.keys()[0]):
            new_dict.update({key:value})

    return new_dict
  • Same problem the other suggestion has - it detects the highest values in the dataset, not peaks. – Mr. T Dec 28 '17 at 17:44