LinAlgError: SVD did not converge in Linear Least Squares when trying polyfit

Question

If I try to run the script below I get the error: LinAlgError: SVD did not converge in Linear Least Squares. I have used the exact same script on a similar dataset and there it works. I have tried to search for values in my dataset that Python might interpret as a NaN but I cannot find anything.

My dataset is quite large and impossible to check by hand. (But I think my dataset is fine). I also checked the length of stageheight_masked and discharge_masked but they are the same. Does anyone know why there is an error in my script and what can I do about it?

import numpy as np
import datetime
import matplotlib.dates
import matplotlib.pyplot as plt
from scipy import polyfit, polyval

kwargs = dict(delimiter = '\t',\
     skip_header = 0,\
     missing_values = 'NaN',\
     converters = {0:matplotlib.dates.strpdate2num('%d-%m-%Y %H:%M')},\
     dtype = float,\
     names = True,\
     )

rating_curve_Gillisstraat = np.genfromtxt('G:\Discharge_and_stageheight_Gillisstraat.txt',**kwargs)

discharge = rating_curve_Gillisstraat['discharge']   #change names of columns
stageheight = rating_curve_Gillisstraat['stage'] - 131.258

#mask NaN
discharge_masked = np.ma.masked_array(discharge,mask=np.isnan(discharge)).compressed()
stageheight_masked = np.ma.masked_array(stageheight,mask=np.isnan(discharge)).compressed()

#sort
sort_ind = np.argsort(stageheight_masked)
stageheight_masked = stageheight_masked[sort_ind]
discharge_masked = discharge_masked[sort_ind]

#regression
a1,b1,c1 = polyfit(stageheight_masked, discharge_masked, 2)
discharge_predicted = polyval([a1,b1,c1],stageheight_masked)

print 'regression coefficients'
print (a1,b1,c1)

#create upper and lower uncertainty
upper = discharge_predicted*1.15
lower = discharge_predicted*0.85

#create scatterplot

plt.scatter(stageheight,discharge,color='b',label='Rating curve')
plt.plot(stageheight_masked,discharge_predicted,'r-',label='regression line')
plt.plot(stageheight_masked,upper,'r--',label='15% error')
plt.plot(stageheight_masked,lower,'r--')
plt.axhline(y=1.6,xmin=0,xmax=1,color='black',label='measuring range')
plt.title('Rating curve Catsop')
plt.ylabel('discharge')
plt.ylim(0,2)
plt.xlabel('stageheight[m]')
plt.legend(loc='upper left', title='Legend')
plt.grid(True)
plt.show()

I'm pretty sure that `polyfit` doesn't support masked arrays, so it will treat NaNs like any other value. You also need to check for infinite values (e.g. using `np.isinf`). — ali_m, Feb 23 '16 at 22:43
Another reason might be is that your have a "vertical line" in your data ! — Yahya, Mar 30 '23 at 17:40

score 27 · Answer 1 · answered Feb 13 '18 at 17:25

27

I don't have your data file, but it almost always that case that when you get that error you have NaN's or infinity in your data. Look for both of those using pd.notnull or np.isfinite

answered Feb 13 '18 at 17:25

ski_squaw

972
1
11
21

Robin · Answer 2 · 2021-06-21T09:35:30.740

As others have pointed out, the problem is likely that there are rows without numericals for the algorithm to work with. This is an issue with most regressions.

That's the problem. The solution then, is to do something about that. And that depends on the data. Often, you can replace the NaNs with 0s, using Pandas .fillna(0) for example. Sometimes, you might have to interpolate missing values, and Pandas .interpolate() is probably the simplest solution to that as well. Or, when it's not a time series, you might be able to simply drop the rows with NaNs in them, using for example Pandas .dropna() method. Or, sometimes it's not about the NaNs, but about the infs or others, and then there are other solutions for that: https://stackoverflow.com/a/55293137/12213843

Exactly which way to go about it, is up to the data. And it's up to you to interpret the data. And domain knowledge goes a long way to interpret the data well.

score 1 · Answer 3 · answered Dec 06 '20 at 08:24

As ski_squaw mentions the error is most of the time due to NaN's, however for me this error came after a windows update. I was using numpy version 1.16. Moving my numpy version to 1.19.3 solved the issue. (run pip install numpy==1.19.3 --user in the cmd)

This gitHub issue explains it more: https://github.com/numpy/numpy/issues/16744

Numpy 1.19.3 doesn't work on Linux and 1.19.4 doesn't work on Windows.

score 0 · Answer 4 · answered Sep 18 '21 at 19:57

0

I developed a code on windows 8. So now I'm using windows 10 and the problem popped up! It was resolved as @Joris said.

pip install numpy==1.19.3

answered Sep 18 '21 at 19:57

Leonardo

120
9

2

While this is a valid answer to the question, at least in your use case, it doesn't add new information that was not already in @Joris's answer. It is best not to post duplicate answers like this. – joanis Sep 18 '21 at 22:12

score 0 · Answer 5 · answered Jan 15 '22 at 00:59

0

my example after fix:

def calculating_slope(x):
        x = x.replace(np.inf, np.nan).replace(-np.inf, np.nan).dropna()
        if len(x)>1:
            slope = np.polyfit(range(len(x)), x, 1)[0]
        else: 
            slope = 0
        return slope

answered Jan 15 '22 at 00:59

Alexandr Kosolapov

153
1
4

LinAlgError: SVD did not converge in Linear Least Squares when trying polyfit

5 Answers5