4

A bit of background first. I am finding the eigenvalues and eigenvectors of a real symmetric matrix, in which rows sum to 0. More specifically, once I find an eigenvector, I use $argsort$ to find the permutation that sorts one of the eigenvalues and apply the permutation to the original matrix.

Now, I implemented the code in python, using the numpy package. The code itself is recursive, and if it finds a set of values in the eigenvector that are equal, it extracts the symmetric submatrix corresponding to the indices for which we have equal values, and applies the algorithm all over again on this matrix.

While this is all very well, and mostly grunt work, I was caught by surprise when a bunch of indices that should have corresponded to equal entries in the eigenvector were not recognized as having equal values. The problem was that the values were computed to machine precision by some algorithm (possibly Lanczos, but I am not entirely familiar with numpy). This is a sample output, in which I explicitly check the difference between two entries in the eigenvector:

    >>> T=spectral.seriation(A,index)

    columns [ 0  1  2  3  4  5  6  7  8  9 10 11]

    [  3.30289130e-01  -2.75240941e-01  -2.75240941e-01   3.30289130e-01
    -2.75240941e-01   3.30289130e-01  -2.75240941e-01   3.30289130e-01
    3.30289130e-01  -2.75240941e-01  -1.69794463e-16  -2.75240941e-01]

    [ 4  6  9  1  2 11 10  0  5  7  8  3]

    difference   -5.55111512313e-17

The routine seriation() is a recursive function. The array of floats is the eigenvector under consideration, and the array below that gives the sorted order of the columns. Note that columns [4,6,9,1,2,11] have the same value. However, eigenvector and eigenvalue calculations are always approximations, and indeed, when I output the difference between the entry in column 9 and column 2, it is non-zero. Where the algorithm should group [4,6,9,1,2,11], it only groups [4,6,9], and puts the rest in another group, throwing a wrench into the works.

So the question is this: is there a method to perform arbitrary precision calculations in numpy? Failing this, what would be a `good' workaround to this problem?

Also, I should probably mention that it can be mathematically proven that these entries must be equal. That is a property of the matrix, but hopefully not germane to the question.

user1137683
  • 43
  • 1
  • 3
  • As @amit has already said, never check that for equality with floating point numbers. Check that they're within some error tolerance. If you want a `numpy` function for it, use `numpy.allclose(a, b)` instead of `a == b`. To more directly answer your question, though, numpy doesn't support arbitrary precision. (You can fake it with object arrays of `decimal`s, but 1) that defeats the entire purpose of numpy arrays, and 2) `numpy.linalg` uses lapack, which doesn't support arbitrary precision, so you wouldn't gain anything in this case.) – Joe Kington Jan 09 '12 at 17:37

4 Answers4

4

Check numpy.allclose and numpy.isclose functions for testing equality within a tolerance.

sim
  • 992
  • 1
  • 7
  • 12
4

Doubles are not exactly real numbers [not even rational]. There are infinite number of rationals in every range [well, every range with at least two elements, to be exact], but only a finite number of bits to represent them.
Thus, you should expect some rounding errors for "exact" calculations.

For more inforamtion, you might want to read what every computer scientist should know about floating-point arithmetic

amit
  • 175,853
  • 27
  • 231
  • 333
  • Indeed, if it was just about finding the eigenvalues and eigenvectors, I can accept the loss of precision. These are, after all, not necessarily rational, or even terminating decimals. What does matter to me, however, is that the program should ignore small discrepancies caused by these rounding errors. Otherwise, as we can see in the case above, the partition of the columns goes awry. – user1137683 Jan 09 '12 at 17:37
3

When performing subtraction of two floating point numbers of comparable size, the precision should not be a problem, i.e. if [2] and [9] really are the same then the difference will be zero.

I suspect what is actually the case is that by default the output displays the numbers to 8 decimal places but beyond that the numbers differ, typically a double has around 16 decimal places of precision, (to find out run numpy.finfo(numpy.float).eps to get the machine epsilon which gives the smallest possible number above 1)

Try inspecting the numbers using an output format of "%.16f\n%.16f" % myarray[[2, 9]].

If they do differ but you are happy with the 7d.p of similarity, then you can truncate the results using something like numpy.around(differences, 7).

Alternatively, if you want to pre-process the data then you can use something like the following (although there may be better ways to do this).

sigcnd, expn = numpy.frexp(myarray)
sigcnd = numpy.around(sigcnd, 7)
truncated_myarray = numpy.ldexp(sigcnd, expn)
Brendan
  • 18,771
  • 17
  • 83
  • 114
1

If you want the indexes of the almost equal elements to a given tolerance you could do something like:

def almost_matches(x, array, rtol=1e-05, atol=1e-08):
    answer = []
    for y in xrange(len(array)):
        if abs(x-array[y]) <= (atol + rtol * abs(array[y])):
            answer.append(y)
    return answer

(using the same approximate comparison as numpy.allclose() uses)

>>> a = [3.30289130e-01,  -2.75240941e-01,  -2.75240941e-01,   3.30289130e-01, -2.75240941e-01, 3.30289130e-01,  -2.75240941e-01,   3.30289130e-01, 3.30289130e-01,  -2.75240941e-01,  -1.69794463e-16,  -2.75240941e-01]
>>> almost_matches(min(a), a)
[1, 2, 4, 6, 9, 11]
dabhaid
  • 3,849
  • 22
  • 30