1

I tried to run a computational code on a cluster. But the results obtained from the computer and the cluster were different, where the cluster tended to break. I thought it was because the issue with the software and thus installed the miniconda in the cluster to be the same version to that of the computer. However, it same issue showed up.

The results of the more complicated computation were verified across different computers and was numerically consistent, and matched up with the analytic calculation. But the cluster, using the same python package provided different results and tended to break.

An example was attached.

from scipy import integrate
import numpy as np


def integrand(theta, t,theta_1x,theta_2x): 
    return np.sin(t/2)*np.sin(theta)*np.cos(t/2)**2/np.sqrt(t-b+a)/np.sqrt(b+a-t)*np.sin(theta_1x)/theta_1x/theta_2x;

def theta_integral(t,ax,bx):
    return integrate.quad(integrand, bx-ax, t , args=(t,ax,bx,))[0]

b=1.5
a=0.1
integral_result_temp=integrate.quad(theta_integral, b-a,b+a ,args=(a,b,) );
print(integral_result_temp)

The computer

Python 3.9.12 (main, Apr  4 2022, 05:22:27) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.

The cluster

Python 3.9.12 (main, Jun  1 2022, 11:38:51)
[GCC 7.5.0] :: Anaconda, Inc. on linux

where they used the same version of the package

import numpy as np
print(np.__version__)
import scipy as scipy
print(scipy.__version__)

1.21.5
1.7.3

The computer provided a result of

(0.07410654702360654, 1.34140350493972e-08)

where the cluster provided a result of

(0.07410654255620497, 1.1830009194468971e-11)

As you could see their results were already different and actually outside the different variance.

In a more complicated version of the integration, the computer was able to provide the correct result, where the cluster generally failed.

How was this possible? and how to fix it?

Related: What's the difference between Python built by MSC and Python built by GCC?

1 Answers1

1

I would ensure that you are not dealing with a problem of imprecise floating point arithmetic. Here is a question that deals with that topic:

Is floating point math broken?

toppk
  • 696
  • 4
  • 10
  • I'm not sure the issue might have been with the compilers. The evaluation of the integral was over the divergent integrand so this could happen but for the range evaluated the computer was able to handle it well. The cluster could not and apparently did much worse even for the simpler expression as in the post. – ShoutOutAndCalculate Sep 03 '22 at 03:53