1

My problem is calculating the area under the peaks in my FT-IR analysis. I usually work with Origin but I would like to see if I get a better result working with Python. The data I'm using is linked here and the code is below. The problem I'm facing is, I don't know how to find the start and the end of the peak to calculate the area and how to set a Baseline.

I found this answered question about how to calculate the area under multiple peaks but I don't know how to implement it in my code: How to get value of area under multiple peaks

import numpy as np
from numpy import trapz
import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv(r'CuCO3.csv', skiprows=5)
print(df)
Wavenumber = df.iloc[:,0]
Absorbance = df.iloc[:,1]
Wavenumber_Peak = Wavenumber.iloc[700:916] #Where the peaks start/end that i want to calculate the area
Absorbance_Peak = Absorbance.iloc[700:916] #Where the peaks start/end that i want to calculate the area

plt.figure()
plt.plot(Wavenumber_Peak, Absorbance_Peak)
plt.show()

Plot of the peaks to calculate the area:

enter image description here

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Wollmy
  • 13
  • 2
  • What is wrong with the code from the post you linked? You should always try to get as far as you can and ask more specific questions as soon as you face problems. – Marc Felix Oct 17 '21 at 20:28
  • Additionally, you wrote that you link the dataset, but I can not see a link to it. Am I missing something? – Marc Felix Oct 17 '21 at 20:33
  • Hey, sorry. The problem im having is, I dont know how to continue it so that it finds the start and end of each peak and calculate the area of them. Here is the file, as mention in my code, the row where the peaks I want to calculate start at row 700 and end at row 916: https://docs.google.com/spreadsheets/d/1Y63j0IM0Ha8eERkYLen7hBlS2CtNX4EtWQDjuxSIlT8/edit?usp=sharing – Wollmy Oct 17 '21 at 21:03

1 Answers1

0

Okay, I have quickly added the code from the other post to your beginning and checked that it works. Unfortunately, the file that you linked did not work with your code, so I had to change some stuff in the beginning to make it work (in a very unelegant way, because I do not really know how to work with dataframes). If your local file is different and processing the file in this way does not work, then just exchange my beginning by yours.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import peakutils

df = pd.read_csv(r'CuCO3.csv', skiprows=5)

data = np.asarray([[float(y) for y in x[0].split(",")] for x in df.to_numpy()])
Wavenumber = np.arange(700, 916)
Absorbance = data[700:916,1]

indices = peakutils.indexes(Absorbance, thres=0.35, min_dist=0.1)
peak_values = [Absorbance[i] for i in indices]
peak_Wavenumbers = [Wavenumber[i] for i in indices]

plt.figure()
plt.scatter(peak_Wavenumbers, peak_values)
plt.plot(Wavenumber, Absorbance)
plt.show()

ixpeak = Wavenumber.searchsorted(peak_Wavenumbers)
ixmin = np.array([np.argmin(i) for i in np.split(Absorbance, ixpeak)])
ixmin[1:] += ixpeak
mins = Wavenumber[ixmin]

# split up the x and y values based on those minima
xsplit = np.split(Wavenumber, ixmin[1:-1])
ysplit = np.split(Absorbance, ixmin[1:-1])

# find the areas under each peak
areas = [np.trapz(ys, xs) for xs, ys in zip(xsplit, ysplit)]

# plotting stuff
plt.figure(figsize=(5, 7))
plt.subplots_adjust(hspace=.33)
plt.subplot(211)
plt.plot(Wavenumber, Absorbance, label='trace 0')
plt.plot(peak_Wavenumbers, Absorbance[ixpeak], '+', c='red', ms=10, label='peaks')
plt.plot(mins, Absorbance[ixmin], 'x', c='green', ms=10, label='mins')
plt.xlabel('dep')
plt.ylabel('indep')
plt.title('Example data')
plt.ylim(-.1, 1.6)
plt.legend()

plt.subplot(212)
plt.bar(np.arange(len(areas)), areas)
plt.xlabel('Peak number')
plt.ylabel('Area under peak')
plt.title('Area under the peaks of trace 0')
plt.show()
Marc Felix
  • 421
  • 3
  • 10
  • 1
    Just for taking your time and trying to solve my problem is greatfull already. I think with this I can try and move on. Many thanks again :) – Wollmy Oct 18 '21 at 17:36