5

I want to plot data using fit function : function f(x) = a+b*x**2. After ploting i have this result:

correlation matrix of the fit parameters:

               m      n      
m               1.000 
n              -0.935  1.000 

My question is : how can i found a correlation coefficient on gnuplot ?

Mehdi
  • 2,160
  • 6
  • 36
  • 53

4 Answers4

16

You can use the stats command in gnuplot, which has syntax similar to the plot command:

stats "file.dat" using 2:(f($2)) name "A"

The correlation coefficient will be stored in the A_correlation variable. (With no name specification, it would be STATS_correlation.) You can use it subsequently to plot your data or just print on the screen using the set label command:

set label 1 sprintf("r = %4.2f",A_correlation) at graph 0.1, graph 0.85

You can find more about the stats command in gnuplot documentation.

jluckyiv
  • 3,691
  • 3
  • 22
  • 15
Nikita Rokotyan
  • 311
  • 3
  • 5
4

Although there is no direct solution to this problem, a workaround is possible. I'll illustrate it using python/numpy. First, the part of the gnuplot script that generates the fit and connects with a python script:

    file = "my_data.tsv"
    f(x)=a+b*(x)
    fit f(x) file using 2:3 via a,b
    r = system(sprintf("python correlation.py %s",file)) 
    ti = sprintf("y = %.2f + %.2fx (r = %s)", a, b, r)
    plot \
      file using 2:3 notitle,\
      f(x) title ti

This runs correlation.py to retrieve the correlation 'r' in string format. It uses 'r' to generate a title for the fit line. Then, correlation.py:

    from numpy import genfromtxt
    from numpy import corrcoef
    import sys
    data = genfromtxt(sys.argv[1], delimiter='\t')
    r = corrcoef(data[1:,1],data[1:,2])[0,1]
    print("%.3f" % r).lstrip('0')

Here, the first row is assumed to be a header row. Furthermore, the columns to calculate the correlation for are now hardcoded to nr. 1 and 2. Of course, both settings can be changed and turned into arguments as well.

The resulting title of the fit line is (for a personal example):

y = 2.15 + 1.58x (r = .592)
Frans
  • 41
  • 3
3

Since you are probably using fit function you can first refer to this link to arrive at R2 values. The link uses certain existing variables like FIT_WSSR, FIT_NDF to calculate R2 value. The code for R2 is stated as:

SST = FIT_WSSR/(FIT_NDF+1)
SSE=FIT_WSSR/(FIT_NDF)
SSR=SST-SSE
R2=SSR/SST

The next step would be to show the R^2 values on the graph. Which can be achieved using the code :

set label 1 sprintf("r = %f",R2) at graph 0.7, graph 0.7

  • I think you’ve abbreviated the blog post too much. The SST calculation needs to happen after a special fit (`fit m 'your data file' using 1:2 via m`, to get the mean value), not after the regular fit as the other assignments. (Also, I think the SSE calculation needs to divide by `FIT_NDF + 2` instead of `FIT_NDF`, but that bug is present in the blog post as well.) – Lucas Werkmeister Sep 14 '18 at 14:34
  • Primarily, R2 is generally used as qualitative more than a quantitative tool. An R2 of 0.88 is practically similar to 0.82 for many applications. But I would love to learn more about this FIT_NDF + 2 logic as it better to be accurate. Also can you please explain this special fit vs general fit terminology. Further, Just to be clear, this is the R*2 value for the fit right. Because this google group (https://groups.google.com/forum/?fromgroups=#!topic/comp.graphics.apps.gnuplot/bX-sKHJaTEg) claims we cannot calculate R*2 using gnuplot. – Sai Avinash Sattiraju Sep 17 '18 at 17:36
1

If you're looking for a way to calculate the correlation coefficient as defined on this page, you are out of luck using gnuplot as explained in this Google Groups thread.

There are lots of other tools for calculating correlation coefficients, e.g. numpy.

Community
  • 1
  • 1
andyras
  • 15,542
  • 6
  • 55
  • 77