0

I have a my_file.xvg contained 240 lines with the numbers arranged in the following format:

    5.4
    5.1
    5.2
    5.4
    5.4
    4.9
    5.0
    5.2
....
    4.9

Using awk I have already calculated the mean value of these data and store it as a "mean" variable in the bash script:

mean=$(awk '{sum+=$1}END{printf "%.1f", sum/NR}' my_file.xvg)

How could I calculate RMSD of these numbers (to determine error of the mean for instance) and store it as the another variable?

  • Have you checked https://stackoverflow.com/questions/18786073/compute-average-and-standard-deviation-with-awk – F. Knorr Feb 04 '21 at 15:00

2 Answers2

1

No need to run awk script twice, you can calculate the stats in one go

$ read -r mean std < <(awk '{s+=$1;ss+=$1^2} END{printf "%.2f %.2f",m=s/NR,sqrt(ss/NR-m^2)}' file)

$ echo $mean $std
5.20 0.18
karakfa
  • 66,216
  • 7
  • 41
  • 56
  • super method! Does it mean that read -r mean std create on the fly two variables from AWK that will be further used in the bash script ? –  Feb 05 '21 at 10:08
  • also could you precise why you used different formula for calculation of the mean (which however gives very closed results)? thank ++ –  Feb 05 '21 at 10:24
  • It's the same formula. Do you mean 2 decimal points instead of 1? – karakfa Feb 05 '21 at 12:23
  • This code actually uses the full precision of the numbers, so accuracy will be higher. You can truncate while printing. Truncation of the mean for std computation is not advised. – karakfa Feb 05 '21 at 15:42
0

Once the value of the mean is saved in the variable a very similar approach can be used for the RMSD. As you seem to prefer awk, see the following:

rmsd=$(awk -v mean=$mean '{++n;sum+=($NF-mean)^2} END{if(n) print sqrt(sum/n)}' my_file.xvg)
pascal
  • 1,036
  • 5
  • 15
  • Thank you, it works fine! how it could be possible to round the rmsd value to the second number after . inside the awk expression ? For instance now the rmsd may be around 0.55715 calculated from the mean=3.7. I need rmsd = 0.56 in that case ... –  Feb 04 '21 at 15:10
  • as in your example use `printf`: `awk -v mean=$mean '{++n;sum+=($NF-mean)^2} END{if(n) printf "%.2f", sqrt(sum/n)}' test.txt` – pascal Feb 04 '21 at 15:17