Optimized way to implement XOR between floats for huge 2D array data

Question

I need to implement xor between floats in python for huge 2D array data (like thousand row by thousand column matrix). I use the following implementation:

import struct
def fxor(a, b):
  rtrn = []
  a = struct.pack('d', a)
  b = struct.pack('d', b)
  for ba, bb in zip(a, b):
    rtrn.append(ba ^ bb)
  return struct.unpack('d', bytes(rtrn))[0]
print(fxor(5.34, 5.34))               #0.0
print(fxor(10.23, 5.34))              #9.54764402360672e-308
print(fxor(10.23,fxor(10.23, 5.34)))  #5.34

The way I use fxor:


# for demo purpose I took 3 by 2 matrix
mat1 = np.random.random_sample((3, 2))
mat2 = np.random.random_sample((3, 2))
resultant = []
for i in range(3):
    row = []
    for j in range(2):
        row.append(fxor(mat1[i][j],mat2[i][j]))
    resultant.append(row)
resultant

Which work perfectly in my case. But when I check the time profile it seems the implementation is very slow for large array (60% of total time).

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
250000    1.438    0.000    1.926    0.000 2837056651.py:3(fxor)
.
.
.
500000    0.124    0.000    0.124    0.000 {built-in method _struct.pack}
250000    0.067    0.000    0.067    0.000 {built-in method _struct.unpack}

Is there any optimized way to do this like np.bitwise_xor does for int value?

Update

@jasonharper suggest me to use .view(np.int64) which work nice:

mat1 = np.random.random_sample((3, 2))
mat2 = np.random.random_sample((3, 2))
print(mat1)
mat3 = np.bitwise_xor(mat1.view(np.int64),mat2.view(np.int64))
print(np.bitwise_xor(mat2.view(np.int64),mat3).view(np.float64))
# output
#[[0.71297944 0.33048679]
# [0.82762999 0.26549565]
# [0.94499741 0.2570297 ]]
#[[0.71297944 0.33048679]
# [0.82762999 0.26549565]
# [0.94499741 0.2570297 ]]

But the issue is, sometimes it gives the following error:

ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

How to handle this error?

Update 2

Everything works nice until the array size cross >10000. Because then I get two error for different different execution. This

ValueError: operands could not be broadcast together with shapes (10000,1250) (10000,10000)

and this.

ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

I can assure you that the dimension of those matrices are same because they are passed through

assert first_mat.shape == second_mat.shape

I was unable to predict the reason because sometimes the program run without any issue and sometimes it raise them for that huge 2D array. If you want to know how I generate those array then here is my another question where I showed how I generate those matrices.

The problem mostly depend on the numpy view

---> 46     return np.bitwise_xor(Matrix.view(np.int64),transformationMatrix.view(np.int64)).view(np.float64)

update 3

@JérômeRichard suggested to check shape of .view() for both matrices. I was surprise that my mat1 was int valued matrix which create the issue. I update that to return always float valued matrix and things are working nice until I got nan value for some cases.

a = np.array([[4.27666612,4.61512052],[0.19573934,0.82816473]])
b = np.array([[0.97597378,0.09191992],[0.32720493,0.86295611]])
np.bitwise_xor(a.view(np.uint8),b.view(np.uint8)).view(np.float64)
# gives
#array([[            nan, 7.72164724e+306],
#       [4.17041859e-308, 1.54832353e-309]])

Which is not feasible for my problem. I was surprise why nan was return as a result of xor kind operation. How to handle this infeasibility?

update 4

I still find the np.bitwise_xor problematic with narray.view(np.uint8) mode. because it gives the overflow value every times.

# overflow values are
np.finfo(np.double).min, np.finfo(np.double).max
# -1.79769313486e+308, 1.79769313486e+308

Even, its become hard to work with the resultant data. Is there no efficient solution at all?

Since your numbers are in a numpy array, you can use `.view(np.int64)` to get a view of the array that treats the data as integers instead of floats - and you can apply bitwise operations to that view. No copies of the data need to be made. — jasonharper, Apr 22 '22 at 18:55
Thanks @jasonharper. Your solution give me a better performance but with a strange issue. Could you see the updated question? — falamiw, Apr 23 '22 at 08:10
by saying "sometimes it gives the following error" do you mean for some random samples or some real matrices? If it's the latter, are those matrices also in float64? — Meow Cat 2012, Apr 23 '22 at 11:52
like for 1000 by 1000 matrix with builtin float of Python give me that error. But if I try to run the code for several times sometime it work and sometime it raise that error. That's why I was confused why the error occur. @MeowCat2012 — falamiw, Apr 23 '22 at 14:30
@falamiw `print(fxor(10.23,fxor(5.34, 5.34)))` prints on my machine `10.23` and `print(fxor(10.23, 5.34))` prints `9.54764402360672e-308`. Is that correct? — Andrej Kesely, Apr 23 '22 at 15:25
Yes your one is correct. I copied it for wrong data. `10.23` is the correct answer. @AndrejKesely I updated that to verify that works like `xor`. — falamiw, Apr 23 '22 at 15:41
`np.view` is indeed the way to go. Consider using `np.uint64` to be safer. The code seems correct. It works on my machine which Numpy 1.20.3. Can you try a recent version of Numpy (it might be an old bug)? Can you check you have a 64-bit interpreter/Numpy (it should be the case otherwise I expect the `np.int64` type not to exists). — Jérôme Richard, Apr 23 '22 at 23:51
my numpy version is `1.21.2` and tried with your suggestion `np.uint64` but the same error `ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.` @JérômeRichard — falamiw, Apr 24 '22 at 13:19
I'm not completely sure about this, but I'm guessing that numpy calculates the shape by dividing the memory usage of the array by the size of the datatype. Maybe while `.view` the memory and the shape create issue for large array @JérômeRichard — falamiw, Apr 24 '22 at 13:29
@falamiw Numpy views are just a reference to a byte array with some meta informations (eg. shape, stride). Users only manipulate views, never the array directly (thus when people talk about Numpy arrays they mean views). `view` checks if the Numpy array have a size divisible by the item type provided. In your case you can check that `mat.view(np.uint8).size` is divisible by 8 (ie. the size of float64 and int64 in bytes). It could be interesting to check the shape too. — Jérôme Richard, Apr 24 '22 at 13:56
Thanks @JérômeRichard I find from where the error raise. But looped into a new error (see my update 3) :) Do you have any idea, how to handle that? And Yes, if you collect all your comments and turn them a answer I will accept that. Thanks again. — falamiw, Apr 24 '22 at 17:54
`Which is not feasible for my problem. I was surprise why nan was return as a result of xor kind operation.` 1) Why is nan a problem? 2) You're getting nan here because IEEE defines any float with all ones in the exponent field and a non-zero significand field as NaN. So getting a NaN value is actually quite probable. https://stackoverflow.com/questions/19800415/why-does-ieee-754-reserve-so-many-nan-values — Nick ODell, Apr 24 '22 at 18:02
Wow, I wasn't aware of that @NickODell. I think I am overthinking. After reading your comment, I tried everything (include `nan`) things and things are working nice. Actually, I test everything before plug them into main codebase. And misunderstood the `nan` value. Thanks for sharing the reason. — falamiw, Apr 24 '22 at 18:07

Optimized way to implement XOR between floats for huge 2D array data

Update

Update 2

update 3

update 4

0 Answers0

Linked