Batch definition of functions, or how to plot spectra

Question

I have a calculated spectrum stored in a file containing two columns: The first is the x-position (wavenumber), the second is the height (intensity). In order to simulate a real spectrum I'd like to place a Gaussian for each of these lines (around 60) and plot the sum of all. As explained here I read the coefficients position and height from the datafile:

g(x,x0,s,I) = I/sqrt(2*pi*s**2)*exp(-(x-x0)**2/(2*s**2))
stats "file.dat" nooutput
nlines=int(STATS_records-1)
paramstr(N) = sprintf('stats "file.dat" every ::%i::%i u (x%i=$1,I%i=$2) nooutput',N,N,N,N)
do for [i=0:nlines] {eval(paramstr(i))}
spectrum(x,s) = sum [i=0:nlines] g(x,value(sprintf('x%i',i)'),s,value(sprintf('I%i',i)))

This works well and produces the desired result. However, in my case, it's not a single file, but around 30, each containing a different spectrum. The files have systematic names and consist of a 5-symbol code build up from "t", "g+" and "g-", i.e. possible filenames are ttttt.dat, g+g+ttg-.dat etc.

I'd like to batch convert all of them, but then only plot specifically chosen ones in an interactive manner. Therefore my idea was to create a separately named function for each of these file, i.e. ttttt(x,s), g+g+ttg-(x,s) etc. Because + and - are not allowed in a function name I replace these with ¯ and _, respectively:

a = '"very/long/path/to/dir/with/files/'
filelist = system('ls '.@a")
conflist = system('ls '.@a"." | sed 's/.dat//' | sed 's/g+/g¯/g' | sed 's/g-/g_/g'")

Now I wrapped the code shown above inside a loop over the words in the filelist. The first attempt had the result that all functions looked the same and contained the parameters of the alphabetically last file, because each cycle overwrote the values of x%i and I%i. I therefore had to make sure each function refers to its own parameters by including the file name (e.g. x3 replaced by xttttt3). By doing so around 3600 variables are defined.

do for [c = 1:words(titlelist)] {
    paramstr(N) = sprintf('stats @a/%s" every ::%i::%i u (x%s%i=$1,I%s%i=$2) nooutput',word(titlelist,c),N,N,word(conflist,c),N,word(conflist,c),N)
    do for [i=0:nlines] {eval(paramstr(i))}
    eval(sprintf("%s(x,s) = sum [i=0:nlines] g(x,value(sprintf('x%s%i',word(conflist,c),i)),s,value(sprintf('I%s%i',word(conflist,c),i)))",word(conflist,c)))
}

However, when calling plot ttttt(x,s=5) I get the error message undefined variable: c. After setting c to some value between 1 and 30, each and every function shows the same spectrum, namely the one corresponding to the 30th entry of filelist. At this point I'm running out of ideas. Does anyone has a suggestion on how to get the proper result from a command like plot tttg¯t(x,s), tg_g¯ttt(x,s), ttttt(x,s)?

EDIT: The problem was caused by wrong nesting of sprintf. Reshuffle of nested sprintfs and careful concatenation of strings resulted in a correctly working call:

eval(sprintf("%s(x,s) = sum [i=0:nlines] g(x,value('x%s'.i),s,value('I%s'.i))", word(conflist,c),word(conflist,c),word(conflist,c)))

Obviously, this is very poorly written, cumbersome, barely understandable and very slow at plotting. I will happily appreciate any more elegant solution!

That's a typical convolution, not a job for gnuplot. Use octave with the `conv` function an plot the result with gnuplot. — Christoph, Jun 05 '18 at 17:06
I'm not sure if I understand correctly, but it seems that you'd expect the function `ttttt` to be independent of `c`. Can you execute `show functions` and paste the definition of that function that your code generates? — user8153, Jun 05 '18 at 17:26
It is possible that because of the complicated way you build these functions (with nested `sprintf` statements) that some occurrences of `c` are not substituted. — user8153, Jun 05 '18 at 20:19
@user8153 It was indeed the wrong nesting that caused the trouble,`show functions` was very helpful in sorting this out, thank you! @Christoph I'm not familiar with convolutions, could you be so kind and expatiate on your suggestion? If it makes things easier or faster I will gladly accept it as an answer. — Eldrad, Jun 06 '18 at 10:57

score 0 · Accepted Answer · answered Nov 18 '19 at 13:25

In the meantime I came up with a solution myself. Gnuplot is rather slow at iterative plotting, which is why it is beneficial to replace iterative summation of gaussians by an explicit sum. This is done by "outsourcing" the iteration to bash, which creates an input file Spectrum.gp for gnuplot:

#!/bin/bash
for FILE in $(ls *.dat); do
NAME=$(echo $FILE | sed 's/.dat//')
echo "${NAME}(x,s) = \\" >> Spectrum.gp
awk -v str1="g(x," -v str2=",s," -v str3=") + \\" '{print str1 $1 str2 $2 str3}' $FILE | sed '$ s/ + \\//g' >> Spectrum.gp
done

sed -i "1i s = 0.8" Spectrum.gp
sed -i "1i g(x,x0,s,I) = expo(x,x0,s)**2>50 ? 0 : I/sqrt(2*pi*s**2)*exp(-2*expo(x,x0,s)**2)" Spectrum.gp
sed -i "1i expo(x,x0,s) = (x-x0)/(2*s)" Spectrum.gp

The resulting file looks like this (let's assume I had two files molecule1.dat and molecule2.dat with 5 lines each:

expo(x,x0,s) = (x-x0)/(2*s)
g(x,x0,s,I) = expo(x,x0,s)**2>50 ? 0 : I/sqrt(2*pi*s**2)*exp(-2*expo(x,x0,s)**2)
s = 0.8
molecule1(x,s) = \
g(x,73.9,s,7.2) + \
g(x,137.37,s,6.2) + \
g(x,218.33,s,3.3) + \
g(x,292.3,s,2.0) + \
g(x,407.94,s,1.0)
molecule2(x,s) = \
g(x,85.3,s,1.2) + \
g(x,150.2,s,5.2) + \
g(x,151.3,s,3.7) + \
g(x,370.3,s,2.3) + \
g(x,401.0,s,9.0)

The file is then loaded in gnuplot by load Spectrum.gp, followed by plot molecule1(x,s). With this approach, generation of Spectrum.gp as well as plotting takes less than a second.

In case anyone wonders about the definition of the gaussian: Far away from the centre any gaussian is close to zero, therefore whenever the term inside the exponent is larger than a reasonably guessed threshold, the total value is set to exactly zero, thereby saving computational cost a bit.

I hope this will help anyone out there.

Batch definition of functions, or how to plot spectra

1 Answers1