0

I'm using a (numpy) array of integers to log potential problems with an array of data. The concept is that each error type has its own integer value, and that these are set so that

err1 = 1
err2 = 2 ** 1
err3 = 2 ** 2
...
errx = 2 ** x

This way, I figure, I can add these error types to the integer logging array, and still know what combination of errors made up that value; so if the end array has a value of 7, I know it must have be made of up 1 + 2 + 4 - ie, err1, err2, and err3.

This all seemed very clever at the time, but I now need to produce a boolean array telling me which cells have logged a given error; so, for example, if I have an error array of

test_arr = np.array(
    [[1, 5, 19],
     [3, 4, 12]]
)

I'd like to get the result

test_contains_err3 = np.array(
    [[False, True, False],
     [False, True, True]]
)

Because the value 4 has gone into making up the values 5 and 4, but not any of the others. I've developed an iterative solution for single values, but that then doesn't work well for a vectorized calculation (the actual array is quite large). Can any one please suggest something? I have a feeling that there's something simpler here that I'm not seeing.

Thanks in advance!

Chris J Harris
  • 1,597
  • 2
  • 14
  • 26

2 Answers2

2

You should look into bitwise operations. That would allow you to encode multiple different numbers in a single joined value, for example the output of the following snippet

a = (3 << 24) + (8 << 16) + 5 
print (a)

print(a>>24 & 0xf)
print(a>>16 & 0xf)
print(a & 0xf)

would look like this:

50855941
3
8
5

Now if you play around with it, you can encode as many variables as you want as long as you make sure to give each variable enough bits to cover the maximum possible value for that variable - an overflow of a single variable would corrupt your data.

Now when you need to compare which errors have been fired, you have to run a check against bitmask (location) of a particular error and you will easily know whether that particular error has been registered.

It seems to me that for your problem you would only need to know which errors have occurred and don't need to save the error codes. You can then employ a simplified scenario where you would reserve 1 bit per error and a bit->error map in code.

Finally, when you want to display which errors were triggered, you simply need to take the binary value of the encoded number and convert 1's to True and 0's to False.

Simas Joneliunas
  • 2,890
  • 20
  • 28
  • 35
  • thanks - this is a good steer. I don't understand your actual answer right now, but I think this is a pointer to an eventual solution. – Chris J Harris Nov 14 '19 at 08:19
  • https://richardswinbank.net/tsql/bitstring_bitmask_encodings Here is a link to a blog post that explores this solution in depth. It is built on SQL, so i hope you will not have trouble transferring it to python. (I was unable to find a blog post that uses python to explain it) – Simas Joneliunas Nov 14 '19 at 08:35
  • 1
    thanks - I'm actually using the method explained here https://stackoverflow.com/questions/12173774/how-to-modify-bits-in-an-integer to directly set bits - one for each error code. This seems to work so far. – Chris J Harris Nov 14 '19 at 08:43
1

I may have a solution, please check if this works for you:

>>> func = lambda x,y: bin(y)[-x] == u'1' if y >= 2**(x-1) else False
>>> func_vec = np.vectorize(func)
>>> check_for_error = 3 # to check err3 = 2**2 = 4
>>> func_vec(check_for_error, test_arr)
array([[False,  True, False],
       [False,  True,  True]])
>>> check_for_error = 4 # to check err4 = 2**3 = 8
>>> func_vec(check_for_error, test_arr)
array([[False, False, False],
       [False, False,  True]]) # only true for 12 (= 8 + 4)

Logic is, when a number is a Binarian, you can find which power of two is used to construct the number if you check for the index of 1s in its binary form.

If you want to check for the errors after they are raised to the power, for example if you want to check for 8, i.e. 2**3, you can use the function as:

import numpy as np
import math
test_arr = np.array(
    [[1, 5, 19],
     [3, 4, 12]]
)
func = lambda x,y: bin(y)[-int(math.log(x,2))] == u'1' if y >= x else False
func_vec = np.vectorize(func)
check_for_error = 8
print(func_vec(check_for_error, test_arr))

Output:

[[False False False]
 [False False  True]]   # checking for 8. 8 found in 12 (= 8 + 4)

EDIT: A method for finding out all the errors that make up the number:

>>> test_arr = 
np.array([[ 1,  5, 19],
         [ 3,  4, 12],
         [ 7, 27, 59]])
>>> func = lambda x: ','.join([str(2**i) for i,j in enumerate(reversed(bin(x))) if j==u'1'])
>>> func_vec = np.vectorize(func)
>>> func_vec(test_arr)
array([['1', '1,4', '1,2,16'],
       ['1,2', '4', '4,8'],
       ['1,2,4', '1,2,8,16', '1,2,8,16,32']], dtype='<U11')
Sayandip Dutta
  • 15,602
  • 4
  • 23
  • 52
  • thanks for this, and it's clever, but I don't think that it works. For example, in checking for whether 4 is included in 5, it returns a False (or even if 4 is included in 4). – Chris J Harris Nov 14 '19 at 07:59
  • It works, but perhaps I didn't explain well enough. I will edit the answer after a little while. – Sayandip Dutta Nov 14 '19 at 13:07
  • And when you pass 4 it is checking whether a number contains 2^3 in it. As per your question err4 = 2**3 – Sayandip Dutta Nov 14 '19 at 13:09
  • Thanks for clarifying! Maybe we are misunderstanding each other, but I'm trying to test if a value is the sum of a given sub value - so if 7 has been constructed from 3 + 4, for example. It looks like your example is using powers though? – Chris J Harris Nov 14 '19 at 23:34
  • 1
    Yes, I am pretty sure we are misunderstanding each other. I guess you are saying you are trying to find if 7 is constructed from 3 + 4, so I assume you want to pass `4` and to the function it should return true for `7`, is that correct? If so, then the second method does that for you. I am also adding a method so that it can give you all the sub values for a number. – Sayandip Dutta Nov 15 '19 at 05:35