I have a very large sparse matrix which represents a transition martix in a Markov Chain, i.e. the sum of each row of the matrix equals one and I'm interested in finding the first eigenvalue and its corresponding vector which is smaller than one. I know that the eigenvalues are bounded in the section [-1, 1] and they are all real (non-complex).
I am trying to calculate the values using python's scipy.sparse.eigs
function, however, one of the parameters of the functions is the number of eigenvalues/vectors to estimate and every time I've increased the number of parameters to estimate, the numbers of eigenvalues which are exactly one grew as well.
Needless to say, I am using the which
parameter with the value 'LR'
in order to get the k largest eigenvalues, with k being the number of values to estimate.
Does anyone have an idea how to solve this problem (finding the first eigenvalue smaller than one and its corresponding vector)?
Asked
Active
Viewed 1,404 times
20

Zachi Shtain
- 826
- 1
- 13
- 31
-
You may have to study the documentation for the underlying ARPACK code. – hpaulj Aug 24 '15 at 16:08
-
@hpaulj, I've already done that, didn't help much – Zachi Shtain Aug 24 '15 at 16:13
-
Do you understand the problem, and matrix, well enough to know whether the are multiple `eigs` with this value? In other words, is this a real property of the matrix, or an error in the code? – hpaulj Aug 24 '15 at 16:16
-
@hpaulj, I understand the problem and matrix structure quite well. There are supposed to be multiple occurrences of 1 as an eigenvalue, this indicates the number of subset I can divide my graph/markov chain into. Part of my problem is that I don't know for every data set that number in advance. – Zachi Shtain Aug 24 '15 at 16:21
-
4You can probably first split your chain into strongly connected components (cf https://stackoverflow.com/questions/21308848/markov-chain-stationary-distributions-with-scipy-sparse/28559746#28559746) and then compute the second largest eigenvalue for each component. IIUC, the answer you are then looking for is then the largest of these. (For general matrix, such a problem would be fairly difficult numerically, but the Markov chain structure of the matrix makes it much easier.) – pv. Aug 25 '15 at 11:58
-
@pv., just to make sure, I've understood your advice: You are suggesting I'll split my chain into multiple smaller ones and attempt to find the eigen vector for each individual sub-chain? – Zachi Shtain Aug 27 '15 at 13:27
-
Any chance you could mathematically use SVD instead for what you are doing? – Paul Aug 29 '15 at 11:03
-
@Paul, no, the eigenvalues from SVD are the squared ones of the original matrix. Since the eigenvalues in my case are between -1 and 1, I can't use SVD,.. – Zachi Shtain Aug 29 '15 at 11:06
-
Just curious, how large is "very large" (the dimensions of your matrix)? – Jim Raynor Jan 03 '16 at 15:38
-
@JimRaynor about 7 million by 7 million or even larger... The problem I describe happens even with smaller matrices. I debugged it on a 8000 by 8000 one – Zachi Shtain Jan 03 '16 at 15:42
-
1Perhaps you should report a bug to scipy. – Dima Tisnek Jan 18 '16 at 10:36
-
@qarma, I think it is not a bug of `scipy`. As far as I can tell `scipy` is using a 3rd party library (`ARPACK`) for the actual calculations – Zachi Shtain Jan 18 '16 at 17:31
-
Let's just say they are more fit to help you :) – Dima Tisnek Jan 19 '16 at 08:35
1 Answers
1
I agree with @pv. If your matrix S
was symmetric, you could see it as a laplacian matrix of the matrix I - S
. The number of connected components of I - S
is the number of zero-eigenvalues of this matrix (i.e, the dimension of the space associated to eigenvalue 1 of S
). You could check the number of connected components of the graph whose similarity matrix is I - S*S'
for a start, e.g. with scipy.sparse.csgraph.connected_components
.

Ant Plante
- 93
- 8