2

I have some data from a bioanalyzer which gives me time (x-axis) and absorbance values (y-axis). The time is every .05 seconds and its from 32s to 138 so you can imagine how many data points I have. I've created a graph using plotly and matplotlib, just so that I have more libraries to work with to find a solution, so a solution in either library is ok! What I'm trying to do is make my script find the area under each peak and return my value.

def create_plot(sheet_name):
    sample = book.sheet_by_name(sheet_name)
    data = [[sample.cell_value(r, c) for r in range(sample.nrows)] for c in range(sample.ncols)]
    y = data[2][18:len(data[2]) - 2]
    x = np.arange(32, 138.05, 0.05)
    indices = peakutils.indexes(y, thres=0.35, min_dist=0.1)
    peaks = [y[i] for i in indices]

This snippet gets my Y values, X values and indices of the peaks. Now is there a way to get the area under each curve? Let's say that there are 15 indices.

Here's what the graph looks like:enter image description here

Harbus
  • 311
  • 1
  • 2
  • 8
  • can you clarify if you're looking for the area beneath the blue curve, or the larger area beneath the red detected peaks? – aorr Jan 19 '19 at 00:54
  • @aorr So im trying to find the area underneath the blue curve. But for each individual peak which are identified by the red dot. Hope that clarifies! – Harbus Jan 19 '19 at 01:00
  • 1
    If we identify where on the x axis we want to 'begin' each identified peak (I think you agree that the peak is an atypical place to start) then we can find the area via trivial rieman trapezoidal sum. Is your question about how to detect the index value to identify the start of one peak, or how to compute the trapezoidal area between adjacent blue values? – aorr Jan 19 '19 at 01:05
  • This is really helpful! The biggest issue instead of finding the start point of each peak manually, is there a way to do it computationally? – Harbus Jan 19 '19 at 01:23
  • the min() value between each peak index might be a good place to start. I recommend updating the question to reflect exactly which part of this problem you're trying to solve and where exactly you're stuck. – aorr Jan 19 '19 at 01:31

1 Answers1

4

An automated answer

Given a set of x and y values as well as a set of peaks (the x-coordinates of the peaks), here's how you can automatically find the area under each of the peaks. I'm assuming that x, y, and peaks are all Numpy arrays:

import numpy as np

# find the minima between each peak
ixpeak = x.searchsorted(peaks)
ixmin = np.array([np.argmin(i) for i in np.split(y, ixpeak)])
ixmin[1:] += ixpeak
mins = x[ixmin]

# split up the x and y values based on those minima
xsplit = np.split(x, ixmin[1:-1])
ysplit = np.split(y, ixmin[1:-1])

# find the areas under each peak
areas = [np.trapz(ys, xs) for xs,ys in zip(xsplit, ysplit)]

Output:

enter image description here

The example data has been set up so that the area under each peak is (more-or-less) guaranteed to be 1.0, so the results in the bottom plot are correct. The green X marks are the locations of the minimum between each two peaks. The part of the curve "belonging" to each peak is determined as the part of the curve in-between the minima adjacent to each peak.

Complete code

Here's the complete code I used to generate the example data:

import scipy as sp
import scipy.stats

prec = 1e5
n = 10
N = 150
r = np.arange(0, N+1, N//n)

# generate some reasonable fake data
peaks = np.array([np.random.uniform(s, e) for s,e in zip(r[:-1], r[1:])])
x = np.linspace(0, N + n, num=int(prec))
y = np.max([sp.stats.norm.pdf(x, loc=p, scale=.4) for p in peaks], axis=0)

and the code I used to make the plots:

import matplotlib.pyplot as plt

# plotting stuff
plt.figure(figsize=(5,7))
plt.subplots_adjust(hspace=.33)
plt.subplot(211)
plt.plot(x, y, label='trace 0')
plt.plot(peaks, y[ixpeak], '+', c='red', ms=10, label='peaks')
plt.plot(mins, y[ixmin], 'x', c='green', ms=10, label='mins')
plt.xlabel('dep')
plt.ylabel('indep')
plt.title('Example data')
plt.ylim(-.1, 1.6)
plt.legend()

plt.subplot(212)
plt.bar(np.arange(len(areas)), areas)
plt.xlabel('Peak number')
plt.ylabel('Area under peak')
plt.title('Area under the peaks of trace 0')
plt.show()
tel
  • 13,005
  • 2
  • 44
  • 62