0

I have been using a script I created some time ago to monitor the convergence of some numerical calculations. What it does is, extract some data with awk, write them in some files and then I use gnuplot to plot the data in a dumb terminal. It works ok but lately I have been wondering if I am writing too much to the disk for such a task and I am curious if there is a way to use gnuplot to plot the result of awk without the need to write the result in a file first.

Here is the script I wrote:

#!/bin/bash
#
input=$1
#
timing=~/tmp/time.dat
nriter=~/tmp/nriter.dat  
totenconv=~/tmp/totenconv.dat
#
test=false
while ! $test; do
    clear
    awk '/total cpu time/ {print $9-p;p=$9}' $input | tail -n 60 > $timing
    awk '/     total energy/ && !/!/{a=$4; nr[NR+1]}; NR in nr{print a,"   ",$5}' $input | tail -n 60 > $nriter
    awk '/!/{a=$5; nr[NR+2]}; NR in nr{print a,"   ",$5}' $input > $totenconv
    gnuplot <<__EOF
set term dumb feed 160, 40
set multiplot layout 2, 2
#
set lmargin 15
set rmargin 2
set bmargin 1
set autoscale
#set format y "%-4.7f"
#set xlabel "nr. iterations"
plot '${nriter}' using 0:1 with lines title 'TotEn' axes x1y1
#
set lmargin 15
set rmargin 2
set bmargin 1
set autoscale
#set format y "%-4.7f"
#set xlabel "nr. iteration"
plot '${nriter}' using 0:2 with lines title 'Accuracy' axes x1y1
#
set rmargin 1
set bmargin 1.5
set autoscale
#set format y "%-4.7f"
set xlabel "nr. iteration"
plot '${totenconv}' using 1 with lines title 'TotEnConv' axes x1y1
#
set rmargin 1
set bmargin 1.5
set autoscale
set format y "%-4.0f"
set xlabel "nr. iteration"
plot '${timing}' with lines title 'Timing (s)' axes x1y1
#plot '${totenconv}' using 2 with lines title 'AccuracyConv' axes x1y1
__EOF
#    tail -n 5 $input
#    echo -e "\n"
    date
    iter=$(grep "    total energy" $input | wc -l)
    conviter=$(awk '/!/' $input | wc -l)
    echo "number of iterations = " $iter "    converged iterations = " $conviter
    sleep 10s
    if grep -q "JOB DONE" $input ; then
        grep '!' $input  
        echo -e "\n"
        echo "Job finished"
        rm $nriter
        rm $totenconv
        rm $timing
        date
        test=true
      else
        test=false
    fi
done

This produces a nice grid of four plots when the data is available, but I would be great if I could avoid writing to disk all the time. I don't need this data when the calculation is finished, just for this monitoring purpose.

Also, is there a better way to do this? Or is gnuplot the only option?

Edit: I am detailing what the awk bits are doing in the script as requested by @theozh:

  1. awk '/total cpu time/ {print $9-p;p=$9}' $input - this one searches for the pattern total cpu time which appears many times in the file $input and goes to the column 9 on the line with the pattern. There it finds a number which is a time in seconds. It takes the difference between the number it finds and the one that it was found before.
  2. awk '/ total energy/ && !/!/{a=$4; nr[NR+1]}; NR in nr{print a," ",$5}' $input - this searches for the patter total energy (there are 5 spaces before the work total) and takes the number it finds on column 4 and also goes to the second line below the line with the pattern and takes the number found at column 5
  3. awk '/!/{a=$5; nr[NR+2]}; NR in nr{print a," ",$5}' $input - here it searches for the pattern ! and takes the number at column 5 from the line and then goes 2 lines below and takes the number at column 5.

awk works with lines and each line is devided in columns. for example the line below:

This is an example

Has 4 columns separated by the space character.

lucian
  • 350
  • 4
  • 18
  • So create a file using `mktemp`. Most probably it will create a file in `/tmp`, which most probably is a `tmpfs`. You can run your script from inside gnuplot. – KamilCuk Feb 22 '21 at 08:09
  • but how does this reduce the number of writes to the disk? from what I read the files are created in /tmp by default. – lucian Feb 22 '21 at 09:30
  • `how does this reduce the number of writes to the disk` Research `tmpfs`. `what I read the files are created in /tmp by default` and you don't use it - the files you create `timing=~/tmp/time.dat` are created in your home directory. – KamilCuk Feb 22 '21 at 09:40
  • 1
    I am not using this `mktemp` because I didn't know about it. I read up on it when you told me and it seemed that the files are created in `/tmp`. I am writing to the home dir at the moment. I will research `tmpfs`. Thank you – lucian Feb 22 '21 at 09:47
  • most likely, it could be done without temporary file and even without awk (maybe at the cost of lower speed). Unfortunately, I don't know awk, but if you want me to check for a _gnuplot-only_ solution, could you please post a few lines of example data and a clear description what exactly you need to have extracted. – theozh Feb 23 '21 at 09:29
  • @theozh the data comes from Quantum Espresso (an electronics structure program) and the output is quite long and impossible to send a short version. But what I extract is columns of data separated like: `X Y Z`. And I plot Y vs X, Z vs X etc. So the data itself is simple. I didn't know that gnuplot can extract data like awk does. – lucian Feb 23 '21 at 16:05
  • yes, gnuplot in principle can extract data as well, certainly not as efficient as awk. But for many cases it might be sufficient and has the advantage that is is really platform independent. If you maybe could translate into words or pseudo-code what `awk '/total cpu time/ {print $9-p;p=$9}' $input` and `awk '/ total energy/ && !/!/{a=$4; nr[NR+1]}; NR in nr{print a," ",$5}' $input` and `awk '/!/{a=$5; nr[NR+2]}; NR in nr{print a," ",$5}' $input` are doing, then I could think about a gnuplot version. The other code `| tail -n 60 > $timing` etc. is clear to me. – theozh Feb 23 '21 at 16:16
  • @theozh I will edit my question since I need to write a bit more. – lucian Feb 23 '21 at 19:52
  • @theozh I have edited the question to include your request. Just to be clear, the file given to the script by `$input` variable is constantly updating during the calculation. And the patterns that the script searches for are not found at regular intervals. Only the numbers on the pattern lines are precisely placed there by the formatting of Quantum Espresso output. – lucian Feb 23 '21 at 20:10

1 Answers1

1

Thank you for your awk explanations, I learned again something useful. I don't want to say that the gnuplot-only solution will be straightforward, efficient and easy to understand, but it can be done. The assumption is that the columns or items are separated by spaces. The ingredients are the following:

  • since gnuplot 5.0 you have datablocks (e.g. $Data) and since gnuplot 5.2.0 you can address the lines via index, e.g. $Data[i]. Check help datablocks. Datablocks are no files on disk but data in memory.
  • writing data to a datablock via with table, check help table.
  • to check whether a string is contained within another string you can use strstr(), check help strstrt.
  • use the ternary operator (check help ternary) to create a filter
  • to get the nth item in a string (separated by spaces) check help word.
  • ! is the negation (check help unary)
  • although there is a line counter $0 in gnuplot (check help pseudocolumns) but it will be reset to 0 if you have a double empty line. That's why I would your my counter, e.g. via n=0 and n=n+1.

As far as I know, if you're using your gnuplot script in bash, you have to escape the gnuplot $ with \$, e.g. \$Data.

In order to mimic tail -n 60, i.e. only plot the last 60 datapoints of a datablock, you can use, e.g.

plot $myNrIter u ($0>|$myNrIter|-60 ? $0 : NaN):1 w lp pt 7 ti "Accuracy"

Again, it is maybe not easy to follow. The code below can maybe still be optimized. The following might serve as a starting point and I hope you can adapt it to your needs.

Code:

### mimic an awk script using gnuplot
reset session

# if you have a file you would first need to load it 1:1 into a datablock
# see here:  https://stackoverflow.com/a/65316744/7295599

$Data <<EOD
# some header of some minimal example data
1     2  3     4   5           6   7    8    9
1     2  total cpu time        6   7    8    9.1
something else                               
1     2  total cpu time        6   7    8    9.2
1     total energy 4.1   5     6   7    8    9
1     2     3      4     5.1   6   7    8    9

 !    2     3      4     5.01  6   7    8    9
1     one line below exclamation mark
1     2nd line below     5.11 exclamation mark
1     2  total cpu time        6   7    8    9.4
1     total energy 4.2   5     6   7    8    9
1     2     3      4     5.2   6   7    8    9
1     2  total cpu time        6   7    8    9.5
# again something else
 !    2     3      4     5.02  6   7    8    9
1 one line below exclamation mark
1     2nd line below     5.22  exclamation mark
1     2  total cpu time        6   7    8    9.9
1     total energy 4.3   5     6   7    8    9
1     2     3      4     5.3   6   7    8    9
 !    2     3      4     5.03  6   7    8    9
1     one line below exclamation mark
1     2nd line below     5.33  exclamation mark
EOD

set datafile missing NaN       # missing data NaN
set datafile commentschar ''   # no comment lines

found(n,s) = strstrt($Data[n],s)>0   # returns true or 1 if string s is found in line n of datablock 
item(n,col) = word($Data[n],col)     # returns column col of line n of datablock

set table $myTiming
    myFilter(n,col) = found(n,'total cpu time') ? (p0=p1,p1=item(n,col),p1-p0) : NaN
    plot n=(p1=NaN,0) $Data u (n=n+1, myFilter(n,9)) w table
set table $myNrIter
    myFilter(n,col1,col2) = found(n,'    total energy') && !found(n,'!') ? \
                            sprintf("%s   %s",item(n,col1),item(n+1,col2)) : NaN
    plot n=0 $Data u (n=n+1, myFilter(n,4,5)) w table
set table $myTotenconv
    myFilter(n,col1,col2) = found(n,'!') ? sprintf("%s   %s",item(n,col1),item(n+2,col2)) : NaN
    plot n=0 $Data u (n=n+1, myFilter(n,5,5)) w table
unset table

print $myTiming
print $myNrIter
print $myTotenconv

set multiplot layout 2,2
    plot $myNrIter    u 0:1 w lp pt 7 ti "Accuracy"
    plot $myNrIter    u 0:2 w lp pt 7 ti "TotEnConv"
    plot $myTotenconv u 0:1 w lp pt 7 ti "AccuracyConv"
    plot $myTiming    u 0:1 w lp pt 7 ti "Timing (s)"
unset multiplot
### end of code

Result: (printout and plot)

 0.1
 0.2
 0.1
 0.4

4.1   5.1
4.2   5.2
4.3   5.3

5.01   5.11
5.02   5.22
5.03   5.33

enter image description here

theozh
  • 22,244
  • 5
  • 28
  • 72
  • thank you for this answer! looks really involved! I never knew that gnuplot is capable of data gathering. I will test the script today and report back. I used to run my script as a bash script that would run continuously. How would I run this to continuously monitor the data? Also, I assume that I have to load the input file into a data block each time I plot (every minute for example) – lucian Feb 24 '21 at 08:33
  • Thanks for your feedback. Well, gnuplot wants to be a plotting tool, not a data preparation tool. In Linux you have such peparation tools, like awk, sed, etc... at hand per default. Under Windows you would have to install them first. However, with this it could happen that your code is not platform independent anymore. About continuous running, either you start your gnuplot script again and again within a bash loop or you can introduce a loop in gnuplot. In gnuplot check `help while`. – theozh Feb 24 '21 at 09:12
  • concerning loop and dumb terminal, you might be interested in the minimal example of this question: https://stackoverflow.com/q/66349304/7295599 and the upcoming answer. – theozh Feb 24 '21 at 11:10
  • @lucian Problem solved or still open? What solution did you go for? Don't hesitate to answer your own question, it might help others as well. – theozh Jun 22 '22 at 08:04
  • sorry for not getting back! I have been swamped with other problems and had to put this on hold! I will give this a serious try soon because I am now using ssds and I would like to spare them the unnecessary writes! – lucian Jun 23 '22 at 09:28
  • I have just tried your solution with a static file and it works beautifully! I will try this with a file that is uptading during a calculation next. Thank you very much for your effort and great explanations! Also I find your solution very easy to follow! – lucian Jul 01 '22 at 15:56