20

I have a very large sparse matrix which represents a transition martix in a Markov Chain, i.e. the sum of each row of the matrix equals one and I'm interested in finding the first eigenvalue and its corresponding vector which is smaller than one. I know that the eigenvalues are bounded in the section [-1, 1] and they are all real (non-complex).
I am trying to calculate the values using python's scipy.sparse.eigs function, however, one of the parameters of the functions is the number of eigenvalues/vectors to estimate and every time I've increased the number of parameters to estimate, the numbers of eigenvalues which are exactly one grew as well.
Needless to say, I am using the which parameter with the value 'LR' in order to get the k largest eigenvalues, with k being the number of values to estimate.
Does anyone have an idea how to solve this problem (finding the first eigenvalue smaller than one and its corresponding vector)?

Zachi Shtain
  • 826
  • 1
  • 13
  • 31
  • You may have to study the documentation for the underlying ARPACK code. – hpaulj Aug 24 '15 at 16:08
  • @hpaulj, I've already done that, didn't help much – Zachi Shtain Aug 24 '15 at 16:13
  • Do you understand the problem, and matrix, well enough to know whether the are multiple `eigs` with this value? In other words, is this a real property of the matrix, or an error in the code? – hpaulj Aug 24 '15 at 16:16
  • @hpaulj, I understand the problem and matrix structure quite well. There are supposed to be multiple occurrences of 1 as an eigenvalue, this indicates the number of subset I can divide my graph/markov chain into. Part of my problem is that I don't know for every data set that number in advance. – Zachi Shtain Aug 24 '15 at 16:21
  • 4
    You can probably first split your chain into strongly connected components (cf https://stackoverflow.com/questions/21308848/markov-chain-stationary-distributions-with-scipy-sparse/28559746#28559746) and then compute the second largest eigenvalue for each component. IIUC, the answer you are then looking for is then the largest of these. (For general matrix, such a problem would be fairly difficult numerically, but the Markov chain structure of the matrix makes it much easier.) – pv. Aug 25 '15 at 11:58
  • @pv., just to make sure, I've understood your advice: You are suggesting I'll split my chain into multiple smaller ones and attempt to find the eigen vector for each individual sub-chain? – Zachi Shtain Aug 27 '15 at 13:27
  • Any chance you could mathematically use SVD instead for what you are doing? – Paul Aug 29 '15 at 11:03
  • @Paul, no, the eigenvalues from SVD are the squared ones of the original matrix. Since the eigenvalues in my case are between -1 and 1, I can't use SVD,.. – Zachi Shtain Aug 29 '15 at 11:06
  • Just curious, how large is "very large" (the dimensions of your matrix)? – Jim Raynor Jan 03 '16 at 15:38
  • @JimRaynor about 7 million by 7 million or even larger... The problem I describe happens even with smaller matrices. I debugged it on a 8000 by 8000 one – Zachi Shtain Jan 03 '16 at 15:42
  • 1
    Perhaps you should report a bug to scipy. – Dima Tisnek Jan 18 '16 at 10:36
  • @qarma, I think it is not a bug of `scipy`. As far as I can tell `scipy` is using a 3rd party library (`ARPACK`) for the actual calculations – Zachi Shtain Jan 18 '16 at 17:31
  • Let's just say they are more fit to help you :) – Dima Tisnek Jan 19 '16 at 08:35

1 Answers1

1

I agree with @pv. If your matrix S was symmetric, you could see it as a laplacian matrix of the matrix I - S. The number of connected components of I - S is the number of zero-eigenvalues of this matrix (i.e, the dimension of the space associated to eigenvalue 1 of S). You could check the number of connected components of the graph whose similarity matrix is I - S*S' for a start, e.g. with scipy.sparse.csgraph.connected_components.

Ant Plante
  • 93
  • 8