3

I have two arrays both from text file. By observation, it totally looks the same. However when I test the equivalence of the two arrays, they fail - element wise, shape wise etc.. I used the numpy test answered here.

Here are the two matrices.

import numpy as np

class TextMatrixAssertions(object):
    def assertArrayEqual(self, dataX, dataY):
        x = np.loadtxt(dataX)
        y = np.loadtxt(dataY)

        if not np.array_equal(x, y):
            raise Exception("array_equal fail.")

        if not np.array_equiv(x, y):
            raise Exception("array_equiv fail.")

        if not np.allclose(x, y):
            raise Exception("allclose fail.")

dataX = "MyMatrix.txt"
dataY = "MyMatrix2.txt"
test = TextMatrixAssertions()
test.assertArrayEqual(dataX, dataY)

I want to know if there is really some difference between the two arrays or if not, what is causing the failures.

Nikko
  • 1,410
  • 1
  • 22
  • 49
  • Presumably printing your values makes them appear the same? I would try doing a `print(repr(x))` and `print(repr(y))` and see if that makes it more clear how the values differ. https://docs.python.org/3/library/functions.html#repr tries to print "a string that would yield an object with the same value when passed to eval()" – Patrick Fay Sep 10 '19 at 06:45
  • 1
    You do realize that your `raise` statements abort the execution of your method, right? So in case `array_equal()` returns `False`, `allclose()` is never reached. – Nils Werner Sep 10 '19 at 07:11
  • Yes, I comment else to check others. – Nikko Sep 10 '19 at 07:11

3 Answers3

8

They are not equal, they have 54 different elements.

np.sum(x!=y)

54

To find what elements are different you can do this:

np.where(x!=y)


(array([  1,   5,   7,  11,  19,  24,  32,  48,  82,  92,  97, 111, 114,
        119, 128, 137, 138, 146, 153, 154, 162, 165, 170, 186, 188, 204,
        215, 246, 256, 276, 294, 300, 305, 316, 318, 333, 360, 361, 390,
        419, 420, 421, 423, 428, 429, 429, 439, 448, 460, 465, 467, 471,
        474, 487]),
 array([18, 18, 18, 17, 17, 16, 15, 12,  8,  6,  5,  4,  3,  3,  2,  1,  1,
        26,  0, 25, 24, 24, 24, 23, 22, 20, 20, 17, 16, 14, 11, 11, 11, 10,
        10,  9,  7,  7,  5,  1,  1,  1, 26,  1,  0, 25, 23, 21, 19, 18, 18,
        17, 17, 14]))
Billy Bonaros
  • 1,671
  • 11
  • 18
0

You should try first your code with a smaller and simpler matrix to test your function.

For example:

import numpy as np
from io import StringIO



class TextMatrixAssertions(object):
    def assertArrayEqual(self, dataX, dataY):
        x = np.loadtxt(dataX)
        y = np.loadtxt(dataY)

        if not np.array_equal(x, y):
            raise Exception("array_equal fail.")

        if not np.array_equiv(x, y):
            raise Exception("array_equiv fail.")

        if not np.allclose(x, y):
            raise Exception("allclose fail.")

        return True

a = StringIO(u"0 1\n2 3")
b = StringIO(u"0 1\n2 3")
test = TextMatrixAssertions()
test.assertArrayEqual(a,b)

Output

True

So I guess your problem is with your file, not your code. You can also try to load the same file in x and y and see the output.

To see what elements are different you can try with not_equal

Example

a = StringIO(u"0 1\n2 3")
c = StringIO(u"0 1\n2 4")
x = np.loadtxt(a)
y = np.loadtxt(c)
np.not_equal(x,y)

Output

array([[False, False],
       [False,  True]])
NicoT
  • 341
  • 2
  • 11
  • Yes. I am aware that functions dont have problem since I use it on other arrays I test. But for this specific matrices, I just can't see the differences. – Nikko Sep 10 '19 at 06:52
  • Then use the not_equal function to see what elements are different – NicoT Sep 10 '19 at 07:01
-1

One more solution. You can see the value of the elements that are not equal. If you run the below code, than you will see that elements that have nan values are not equal and hence are causing to raise an exception.

import numpy as np

class TextMatrixAssertions(object):
    def assertArrayEqual(self, dataX, dataY):
        x = np.loadtxt(dataX)
        y = np.loadtxt(dataY)

        if not np.array_equal(x, y):
            not_equal_idx = np.where(x != y)
            for idx1, idx2 in zip(not_equal_idx[0],not_equal_idx[1]):
                print(x[idx1][idx2])
                print(y[idx1][idx2])
            raise Exception("array_equal fail.")

        if not np.array_equiv(x, y):
            raise Exception("array_equiv fail.")

        if not np.allclose(x, y):
            raise Exception("allclose fail.")

dataX = "MyMatrix.txt"
dataY = "MyMatrix2.txt"
test = TextMatrixAssertions()
test.assertArrayEqual(dataX, dataY)

output:

nan
nan
nan
...
nan
eugen
  • 1,249
  • 9
  • 15
  • This is needlessly complicated. – Nils Werner Sep 10 '19 at 07:10
  • @Nils Werner, can you comment why is this complicated? The idea is to explain to the OP of the reasons why his code is not behaving as he expected. How does my code complicate the understanding of this? – eugen Sep 10 '19 at 07:12
  • Using indices to access the differing elements is not the most efficient, the `for` loop is unnecessary and using `x[i][j]` bad style and can have unintended consequences. – Nils Werner Sep 10 '19 at 07:14
  • I do not understand you. maybe you mean some corner cases when the input files are empty or have different lengths. however for the input provided by the OP and for the purposes of explaining him why the code is raising an exception, my code does the job. – eugen Sep 10 '19 at 07:16
  • Yes, its just needlessly complicated :-) `idx = x != y; print(x[idx], y[idx])` does the same, but simpler and faster. – Nils Werner Sep 10 '19 at 07:17
  • so then how does `idx = x != y; print(x[idx], y[idx])` solve what you say ' unintended consequences' ? could you please elaborate? – eugen Sep 10 '19 at 07:22
  • See [this SO answer](https://stackoverflow.com/questions/38113994/why-does-indexing-numpy-arrays-with-brackets-and-commas-differ-in-behavior) – Nils Werner Sep 10 '19 at 07:25
  • the SO answer you provided talks about `x[:][1]` vs `x[:,1]` . How does my answer use this answer incorrectly? Also, I do not feel the downvote justified for my answer being complicated. My answer should be downvoted if it is incorrect or does not answer the OP's question. – eugen Sep 10 '19 at 07:45
  • It uses `x[i][j]` when you should always use `x[i, j]` – Nils Werner Sep 10 '19 at 07:48
  • I do not see it anywhere stated that you should always use `x[i, j]` . Could you please pinpoint where it is said so? It may be true when you are slicing with a colon, but how is this true when you have exact index number? – eugen Sep 10 '19 at 07:52