I am using numpy.cov to create a covariance matrix from a dataset of over 400 time series. Using linalg.det gives me a value of zero so matrix is singular. I can use linalg.svd to see that the rank is two less than the number of columns so somewhere in the covariance matrix I have some linear combinations to make the matrix degenerate. I have used corrcoef on the underlying timeseries but no correlation > 0.78 so not obvious there. Can someone suggest a method to determine the location of the degenerate columns. Thank you.
Asked
Active
Viewed 4,666 times
1 Answers
8
If you take the QR
decomposition of a matrix A
, the columns of R
with a non-zero value along the diagonal correspond to linearly independent columns of A
.
import numpy as np
linalg = np.linalg
def independent_columns(A, tol = 1e-05):
"""
Return an array composed of independent columns of A.
Note the answer may not be unique; this function returns one of many
possible answers.
http://stackoverflow.com/q/13312498/190597 (user1812712)
http://math.stackexchange.com/a/199132/1140 (Gerry Myerson)
http://mail.scipy.org/pipermail/numpy-discussion/2008-November/038705.html
(Anne Archibald)
>>> A = np.array([(2,4,1,3),(-1,-2,1,0),(0,0,2,2),(3,6,2,5)])
>>> independent_columns(A)
np.array([[1, 4],
[2, 5],
[3, 6]])
"""
Q, R = linalg.qr(A)
independent = np.where(np.abs(R.diagonal()) > tol)[0]
return A[:, independent]
def matrixrank(A,tol=1e-8):
"""
http://mail.scipy.org/pipermail/numpy-discussion/2008-February/031218.html
"""
s = linalg.svd(A,compute_uv=0)
return sum( np.where( s>tol, 1, 0 ) )
matrices = [
np.array([(2,4,1,3),(-1,-2,1,0),(0,0,2,2),(3,6,2,5)]),
np.array([(1,2,3),(2,4,6),(4,5,6)]).T,
np.array([(1,2,3,1),(2,4,6,2),(4,5,6,3)]).T,
np.array([(1,2,3,1),(2,4,6,3),(4,5,6,3)]).T,
np.array([(1,2,3),(2,4,6),(4,5,6),(7,8,9)]).T
]
for A in matrices:
B = independent_columns(A)
assert matrixrank(A) == matrixrank(B) == B.shape[-1]
assert matrixrank(A) == matrixrank(B)
checks that the independent_columns
function returns a matrix of the same rank as A
.
assert matrixrank(B) == B.shape[-1]
checks that the rank of B
equals the number of columns of B
.

unutbu
- 842,883
- 184
- 1,785
- 1,677
-
Say I have already reduced the matrix to its independent matrix, which has a full rank(for the columns) - then if I run an LR model on it and still get the error, I assume the tolerance should be way more higher. But then, YOUR code actually BUILDS on top of linalg , should I increase the tolerance in the "independent_columns" above ? what's the default tol in in linalg.py ? Error log while doing Logit (Statsmodels) File "/usr/local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 328, in solve raise LinAlgError, 'Singular matrix' LinAlgError: Singular matrix – ekta May 15 '14 at 11:41
-
Just to add, I read through the linalg.py file, but couldn't pin-point on the "tol" it is feeding on. This is the file (To save you lookup time, in case you consider answering ) https://github.com/numpy/numpy/blob/master/numpy/linalg/linalg.py . Have also recorded the whole scenario(Don't think its an issue though) in git here, https://github.com/numpy/numpy/issues/4715 – ekta May 15 '14 at 11:57
-
I'm not sure I'd be able to help you, but could you post code which demonstrates the problem? Especially since code does not format well in comments, please start a new question for this. – unutbu May 15 '14 at 12:03
-
After trying hard to debug the issue, I posted the issue here. Please see if you can take a stab at it ? http://stackoverflow.com/questions/23848003/detecting-mulicollinear-or-columns-that-have-linear-combinations-while-modelli – ekta May 24 '14 at 17:46