8

For some reason I cannot get this block of code to run properly anymore:

import numpy as np
from sklearn.linear_model import LinearRegression

# Create linear data with some noise
x = np.random.uniform(0, 100, 1000)
y = 2. * x + 3. + np.random.normal(0, 10, len(x))

# Fit linear data with sklearn LinearRegression
lm = LinearRegression()
lm.fit(x.reshape(-1, 1), y)
Traceback (most recent call last):
  File "<input>", line 2, in <module>
  File "C:\Python37\lib\site-packages\sklearn\linear_model\_base.py", line 547, in fit
    linalg.lstsq(X, y)
  File "C:\Python37\lib\site-packages\scipy\linalg\basic.py", line 1224, in lstsq
    % (-info, lapack_driver))
ValueError: illegal value in 4-th argument of internal None

I'm not sure why I'm getting this error on such a simple example. Here are my current versions:

scipy.__version__
'1.5.0'
sklearn.__version__
'0.23.1'

I'm running this on 64-bit Windows 10 Enterprise and Python 3.7.3. I've tried uninstalling and reinstalling scipy and scikit-learn. I've tried earlier version of scipy. I've tried uninstalling and reinstalling Python and none of these solved the issue.

Update: So it appears to be tied to matplotlib too. I was running this example previously in Pycharm, but I've moved to running it directly from the PowerShell. So if I run this code outside of Pycharm I do not get an error

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Create linear data with some noise
x = np.random.uniform(0, 100, 1000)
y = 2. * x + 3. + np.random.normal(0, 10, len(x))

# Fit linear data with sklearn LinearRegression
lm = LinearRegression()
lm.fit(x.reshape(-1, 1), y)

However if I plot the data during it I get an error:

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Create linear data with some noise
x = np.random.uniform(0, 100, 1000)
y = 2. * x + 3. + np.random.normal(0, 10, len(x))

# Plot data
plt.scatter(x, y)
plt.plot(np.linspace(0, 100, 10), 2. * np.linspace(0, 100, 10) + 3., ls='--', c='red')

# Fit linear data with sklearn LinearRegression
lm = LinearRegression()
lm.fit(x.reshape(-1, 1), y)
 ** On entry to DLASCLS parameter number  4 had an illegal value
Traceback (most recent call last):
  File ".\run.py", line 18, in <module>
    lm.fit(x.reshape(-1, 1), y)
  File "C:\Python37\lib\site-packages\sklearn\linear_model\_base.py", line 547, in fit
    linalg.lstsq(X, y)
  File "C:\Python37\lib\site-packages\scipy\linalg\basic.py", line 1224, in lstsq
    % (-info, lapack_driver))
ValueError: illegal value in 4-th argument of internal None

But if I comment out the line plt.plot(np.linspace(0, 100, 10), 2. * np.linspace(0, 100, 10) + 3., ls='--', c='red') it works fine.

evan.tuck
  • 81
  • 1
  • 3
  • Looks like you are missing the lapack library (it does linear algebra). How did you install python? Try to use [miniconda](https://docs.conda.io/en/latest/miniconda.html) instead. – BlackBear Jun 24 '20 at 18:46
  • I installed python with chocolatey. LAPACK was one of my intuitions but I couldn't figure out how to diagnose if that is actually the issue – evan.tuck Jun 24 '20 at 19:25
  • dlascls is from lapack. You can find lots of cries for help showing the same message about parameter 4. Perhaps using a newer version or a different implementation would help? – BlackBear Jun 24 '20 at 19:47
  • whats the best way to install/uninstall/update lapack on windows? – evan.tuck Jun 24 '20 at 19:48
  • 1
    Allow me to suggest [miniconda](https://docs.conda.io/en/latest/miniconda.html) again :) – BlackBear Jun 24 '20 at 19:51
  • I installed a new python environment with miniconda and installed LAPACK with conda install -c conda-forge lapack but i am still getting the same error – evan.tuck Jun 24 '20 at 20:24
  • You should simply install numpy/scipy/scikit via conda. And make sure you are using conda's python and not your old python. The easiest way is to use the anaconda terminal (the prompt should have a `(base)` before the current directory) – BlackBear Jun 24 '20 at 20:29
  • Yes that's what I tried and I still had the same issue – evan.tuck Jun 24 '20 at 20:37
  • 1
    Sorry, I installed those packages with PIP inside the anaconda prompt but i just uninstalled them and reinstalled with conda and it seems to be working! – evan.tuck Jun 24 '20 at 20:42
  • Great! Conda can give headaches sometimes – BlackBear Jun 24 '20 at 20:55

10 Answers10

2

It seems it only happens when you print the figure using matplotlib, else you can run the fit algorithm as many times as you like.

However if you change the data type from float64 to float32 (Grzesik answer), strangely enough the error disappears. Feels like a bug to me Why would changing the data type affect the interaction between matplotlib and the lapack_function within sklearn?

More a question than an answer, but it is a bit scary to find these unexpected interactions across functions and data types.

import numpy as np
import sklearn
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt


def main(print_matplotlib=False,dtype=np.float64):
    x = np.linspace(-3,3,100).astype(dtype)
    print(x.dtype)
    y = 2*np.random.rand(x.shape[0])*x + np.random.rand(x.shape[0])
    x = x.reshape((-1,1))

    reg=LinearRegression().fit(x,y)
    print(reg.intercept_,reg.coef_)
    
    yh = reg.predict(x)
    
    if print_matplotlib:
        plt.scatter(x,y)
        plt.plot(x,yh)
        plt.show()


No plotting

if __name__ == "__main__":
    np.random.seed(64)
    main(print_matplotlib = False, dtype=np.float64)
    np.random.seed(64)
    main(print_matplotlib = False, dtype=np.float64)  
    pass

float64
0.5957165420019624 [0.91960601]
float64
0.5957165420019624 [0.91960601]

Plotting dtype = np.float64

if __name__ == "__main__":
    np.random.seed(64)
    main(print_matplotlib = True, dtype=np.float64)
    np.random.seed(64)
    main(print_matplotlib = True, dtype=np.float64)
    pass

float64
0.5957165420019624 [0.91960601]

Plot 1

float64
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-52593a548324> in <module>
      3     main(print_matplotlib = True)
      4     np.random.seed(64)
----> 5     main(print_matplotlib = True)
      6 
      7     pass

<ipython-input-1-11139051f2d3> in main(print_matplotlib, dtype)
     11     x = x.reshape((-1,1))
     12 
---> 13     reg=LinearRegression().fit(x,y)
     14     print(reg.intercept_,reg.coef_)
     15 

~\Anaconda3\lib\site-packages\sklearn\linear_model\_base.py in fit(self, X, y, sample_weight)
    545         else:
    546             self.coef_, self._residues, self.rank_, self.singular_ = \
--> 547                 linalg.lstsq(X, y)
    548             self.coef_ = self.coef_.T
    549 

~\AppData\Roaming\Python\Python37\site-packages\scipy\linalg\basic.py in lstsq(a, b, cond, overwrite_a, overwrite_b, check_finite, lapack_driver)
   1249         if info < 0:
   1250             raise ValueError('illegal value in %d-th argument of internal %s'
-> 1251                              % (-info, lapack_driver))
   1252         resids = np.asarray([], dtype=x.dtype)
   1253         if m > n:

ValueError: illegal value in 4-th argument of internal None

Plotting dtype=np.float32

if __name__ == "__main__":
    np.random.seed(64)
    main(print_matplotlib = True, dtype=np.float32)
    np.random.seed(64)
    main(print_matplotlib = True, dtype=np.float32)
    pass

Output 2

Alberto GR
  • 21
  • 3
2

This appears to be caused by a bug in Windows (update 2004?).

  1. The posted problem https://github.com/scipy/scipy/issues/12893
  2. is a duplicate of https://github.com/scipy/scipy/issues/12747 and
  3. is caused by https://github.com/numpy/numpy/issues/16744

It is related to whether Numpy can interface with a particular Basic Linear Algebra Subprograms (BLAS).

The most popular workarounds are to install Numpy using conda or to use a non-Windows (e.g. GNU/Linux OS). conda bundles the Intel Math Kernel Library (MKL) which does not have the issue. Non-Windows systems don't have Windows's problems. Supposedly Microsoft will provide a patch sometime around January 2021.

If this issue affects you, as it does many others, please remember that for Numpy, as well as Python and many other Free packages, the license clearly states,

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"

Please be mindful of that (i.e. be polite and respectful) in any comments toward the developers of these systems.

Lorem Ipsum
  • 4,020
  • 4
  • 41
  • 67
1

As of numpy 1.19.1 and sklearn v0.23.2, I found that polyfit(deg=1) and LinearRegression().fit() gave unexpected errors without any good reason. No, data didn't have any NaN or Inf value. I eventually used scipy.stats.linregress().

slope, intercept, r_value, p_value, std_err = stats.linregress(x.astype(np.float32), y.astype(np.float32))
Tae-Sung Shin
  • 20,215
  • 33
  • 138
  • 240
1

First check for nan,inf values. and also try normalize=True

lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit() 

But these didn't work for me. Also, my data didn't have any nan or inf values. But while experimenting, I found that running the same code second time works. hence I did this

try: 
    lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit()
except:
    lreg=LinearRegression(fit_intercept=True, normalize=True, copy_X=True).fit()

I don't know why this work, but this solved the problem for me. So trying to run the same code twice did the trick for me.

0

You miss plt.show() in your code. Put it after this line:

plt.plot(np.linspace(0, 100, 10), 2. * np.linspace(0, 100, 10) + 3., ls='--', c='red')
plt.show()
Dammio
  • 911
  • 1
  • 7
  • 15
0

I would suggest you to use the parameter normalize=True in your code to avoid this.

LinearRegression(fit_intercept=True,
                 normalize=True,
                 copy_X=True,
                 n_jobs=None)

This resolved the error for me.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Ash Upadhyay
  • 1,796
  • 2
  • 15
  • 20
0

In your code change it:

lm.fit(x.reshape(-1, 1), y)   

on:

lm.fit(x.reshape(-1, 1).astype(np.float32), y)
Grzesik
  • 111
  • 2
  • 3
0

In scipy/linalg/basic.py, there is line 1031 lstsq function. the argument lapack_driver in lstsq is set to None. line 1162 if driver is None, driver is set to 'gelsd' I think 'gelsd' is the problem. If you change driver = 'gelsy', the code is working well.

0

I have the same problem when running sklearn's linear regression from WSL2 VSCode jupyter notebook (python 3.8.8). The regression would produce random results even on very trivial examples (e.g. y=x), and occasionally throw up this ValueError.

After many trials, the fix is to upgrade to scipy 1.7.1 (from 1.6.2). After the upgrade, the regression produces correct results. No more random errors!

Steve Lihn
  • 357
  • 3
  • 5
0

For me it was that some of my data points in the in my dataset had too many decimals places going into my polynomial fit. My best guess it can be an overflow error, or an error caused by nan values (in my case I didn't have any nans). I stopped encountering the error after rounding my dataset array.

You can try rounding all the datapoints in the dataset array:

data_array = np.round(data_array,4)
ThomasAFink
  • 1,257
  • 14
  • 25