nls when the data is a convoluted spectra (variable number of variables)

Question

I have some spectrum data that looks like one of the multiplets when is plotted: http://journals.prous.com/journals/dof/19982303/html/df230301/images/keiferf3.gif

How it is seen in the image, all the peaks are realy close among each other, so I would like to do some deconvolution using nls function, like it was posted before (R: Fitting Gaussian peaks to density plot data using nls), but using a Lorentzian function instead:

y <- 1/(pi*a*(1+((x-x0)/a)^2))

In my case, x0 is the peak maximum (and length(x0) is the number of peaks), so I only need to optimize 'a'.

However, my problem is not related to perform that, but in writing a robust script that would deconvolute any spectrum, taking the number of peaks as input information.

My first idea was to write the lorentzian function and leave the 'a' as a vector (to apply thereafter a sum of all lorentzian curves), but R doesn't recognize this structure:

for (i in 1:length(x0)) {
    f[i]<-function(a) { y <- 1/(pi*a[i]*(1+((x-x0[i])/a[i])^2)) }
}

fit <- nls(sum(f[1:length(x0)]), start=list(a=rep(1, times=length(x0))))

Update:

This is my sample, in .csv format (https://drive.google.com/file/d/0B66EHLI5AufhbjlWcW9rYXl1UFk/edit?usp=sharing). Data is filled in 2 rows. The first one has the frequency (in ppm), and the second the intensity. For this data, I will pick 5 peaks, so I would do 'nls' on formula=f[1]+f[2]+f[3]+f[4]+f[5] and I would have 5 parametres (a[1:5]) to evaluate.

Side note: `f <- 1/pi*a*(1+(x-x0)/a^2)` is a vectorized replacement for your loop. — tonytonov, Jan 27 '14 at 12:40
I know that my code is not well written, but I thought posting the code will somehow help to understand my request. — user3032330, Jan 27 '14 at 13:44
Peeking at the referenced question: clearly this only works when the peaks are well-separated (e.g. Rayleigh criterion). Given that, I think you need to "explain" to `nls` that you've got a set of variables to fit. Maybe make `list` whose elements are the various `f[i]` ? — Carl Witthoft, Jan 27 '14 at 13:57
Can you provide sample data, say the data from the spectrum in your link? — jlhoward, Jan 27 '14 at 14:09
@tonytonov your vectorized version will produce a value, but I think the OP wants a collection of defined functions. — Carl Witthoft, Jan 27 '14 at 14:13
@CarlWitthoft I left the logic as it is stated in the question; this is not even a partial answer, of course. — tonytonov, Jan 27 '14 at 14:18
Why do you want to fit and deconvolve the spectra? Particulary, do you want to deconvolve so you can then get a better fit, or do you want to fit and use the band position as the ultimately deconvolved version of the spectrum? Also, what kind of spectra are you talking about? — cbeleites unhappy with SX, Jan 27 '14 at 14:49
I'm talking about NMR spectra. I'm currently working with biological samples that have lots of metabolits, so I have lots overlapping peaks in my spectra. Usually, people only compare this kind of spectra performing PCA analysis, but I would like to go further and to do that I need to extract the area of each pick alone. — user3032330, Jan 27 '14 at 15:01
OK, but do you need the area under a peak, or just its location? — Carl Witthoft, Jan 27 '14 at 15:31
I need the area, but I can calculate it easily if I have the function with 'a'. — user3032330, Jan 27 '14 at 15:40
With this function (http://stackoverflow.com/questions/6836409/finding-local-maxima-and-minima) I found all the local maxima and then I choose the 5 with highest intensities. These 5 values would be x0[1:5] in my function. — user3032330, Jan 27 '14 at 15:43

score 0 · Accepted Answer · answered Jan 29 '14 at 16:36

I've found the 'as.formula' command, which allows to convert any character string to formula. Therefore, I manage to create a for loop that creates the sum of lorentzian curves. In my example, parameters a[1:5] are now defined as a-e, but with sprintf I can use the vector nomenclature as well.

cac<-"abcdefghijklmnopqrstuvwxyz"
for (i in 1:length(x0)) {
    letter<-substr(cac, i,i)
    formulae[i]<-sprintf("1/(pi*%s*(1+((spectra[1,]-%f)/%s)^2))", letter,x0[i],letter)
    coeff[i]<-sprintf("%s=1", letter)
}
formula2<-paste(formulae, collapse="+")
formulo<-paste("spectra[2,] ~", formula2, sep="")
coeffs<-paste(coeff, collapse=",")

fit<-as.formula(paste("nls(",formulo,",start=list(",coeffs,"))", sep=""))

So, all this code brings me the following formula for the posted example:

"nls(spectra[2,] ~1/(pi*a*(1+((spectra[1,]-2.156460)/a)^2))+1/(pi*b*(1+((spectra[1,]-2.170150)/b)^2))+1/(pi*c*(1+((spectra[1,]-2.184820)/c)^2))+1/(pi*d*(1+((spectra[1,]-2.163550)/d)^2))+1/(pi*e*(1+((spectra[1,]-2.142040)/e)^2)), start=list(a=1,b=1,c=1,d=1,e=1))

However, it seems that the formula does not work, but this wasn't the aim of this thread, so I can say now it's closed (I opened a new one to try to solve it).

nls when the data is a convoluted spectra (variable number of variables)

1 Answers1

Linked