2

I have two boolean Numpy arrays of boolean indicators:

                          v                          v              v
A =    np.array([0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1], dtype=bool)
B =    np.array([1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1], dtype=bool)
                                         ^                 ^        ^

Moving from left to right, I would like to isolate the first true A indicator, then the next true B indicator, then the next true A indicator, then the next true B indicator, etc. to end up with:

                          v                          v              v
>>>> A_result = [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1]
     B_result = [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1]
                                         ^                 ^        ^

I have a feeling I could create a betweenAB array indicating all the places where A==1 is followed by B==1:

                          v                          v              v
betweenAB =     [0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1]
                                         ^                 ^        ^

then take the start and end indices of each run, but I am still somewhat of a beginner when it comes to Numpy and am not sure how I might do that.

I'm looking for a fully vectorized approach as there are thousands of these arrays in my application each containing thousands of elements. Any help would be much appreciated.

Craig Nathan
  • 197
  • 10

1 Answers1

1

This can barely be done efficiently with Numpy (probably not possible efficiently without loops), but easily and efficiently with the Numba's JIT. This is mainly due to the rather sequential nature of the applied operation.

Here is an example in Numba:

import numpy as np
import numba as nb

nb.jit('UniTuple(bool[::1],2)(bool[::1],bool[::1])')
def compute(A, B):
    assert len(A) == len(B)
    n = len(A)
    i = 0
    resA = np.zeros(n, dtype=bool)
    resB = np.zeros(n, dtype=bool)
    while i < n:
        while i < n and A[i] == 0:
            resA[i] = 0
            i += 1
        if i < n:
            resA[i] = 1
            if B[i] == 1:
                resB[i] = 1
                i += 1
                continue
            i += 1
        while i < n and B[i] == 0:
            resB[i] = 0
            i += 1
        if i < n:
            resB[i] = 1
            i += 1
    return resA, resB
Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
  • This is great, thank you for this. This gets me the to the `betweenAB` array. I have never used Numba before, would you mind recommending how you would arrive at the desired end result (second code block in the question - (2) updated `A` and `B` arrays) using this method? – Craig Nathan Jul 07 '21 at 00:16
  • I updated the answer to arrive at the desired result. Only needed minor changes. Thank you for doing all the legwork! Can you please explain the `('bool[::1](bool[::1],bool[::1])')` params for jit? What is this saying? – Craig Nathan Jul 07 '21 at 01:21
  • Or can you perhaps provide a resource explaining this syntax? – Craig Nathan Jul 08 '21 at 16:18
  • 1
    Ok. I did not checked carefully the code, but looked fine for the example. Thank you for the fix. Regarding the Numba nb.jit decorator, this string specify the type of the parameter and the return type. It is optional, but it is often better to add it to compile the function code ahead of time rather than at run time. The syntax is `"returnType(ParamType1, ParamType2, ...)"`. Basic types include int32, int64, float32... A 1D array type is described as `type[:]`, a 2D array `type[:,:]` and so on. `:1` can be used to specify the dimension is contiguous so that the compiled code will be faster. – Jérôme Richard Jul 08 '21 at 17:11
  • 1
    You can find more information in the [Numba's documentation](https://numba.readthedocs.io/en/stable/reference/types.html). – Jérôme Richard Jul 08 '21 at 17:12
  • Thank you for the explanation, that is very helpful – Craig Nathan Jul 08 '21 at 17:56
  • If we are returning two arrays (my update), does the decorator's signature need to reflect that or can it remain as-is if both arrays returned are the same type? – Craig Nathan Jul 08 '21 at 18:03
  • Yes, the decorator must be changed. One should tuples in that case (something like `UniTuple(bool[::1],2)` here I think). – Jérôme Richard Jul 08 '21 at 18:04
  • I edited the post to fix that, but I cannot test the code yet. It *should* work with `UniTuple`. AFAIK, it is not possible to use the `(type1, type2)` syntax as it conflicts with the syntax of function parameters. – Jérôme Richard Jul 08 '21 at 18:09
  • Thank you for the update. If you are able to confirm that at some point, that would be very helpful. I was able to compile it with the original signature so I'm not sure how to tell what is correct and what isn't. Also, Does the `2` in `UniTuple(bool[::1],2)` indicate the length of the tuple? – Craig Nathan Jul 08 '21 at 18:36
  • 1
    Ok. If you can run it without errors and get correct results, it means it is fine (or ignored by Numba which could be possible but would be surprising). Yes, 2 is the size and `Uni` means the 2 items are of the same type (uniform). – Jérôme Richard Jul 08 '21 at 18:38
  • 1
    I just found an answer which seems to clear this up: https://stackoverflow.com/a/35654032/12814841 – Craig Nathan Jul 08 '21 at 18:41