1

I have a sparse array of length 64, encoded with a 64-bit integer binaryPattern.

For example:

binaryPattern = 0b0000000000000000000000000000000000000000000000000000000000001010

represents this array:

[0, value1, 0, value2, 0, 0, 0,,,,]

I need to recover the indexes of valueX, from binaryPattern as fast as possible.

I wrote various function attempts named getIndexesXXX(), but those functions, apart from being slow themselves, have performances that change depending on conditions like the percentage of zeros, or if the ones are at the start or at the end of the binaryPattern

I'm using Python 3.10, so I have access to the functions like (int).bit_length().

Note that performance changes between debugging and running without debugging. What matters is when non debugging.

This is the code I have:

binaryPattern written in binary, is 1 when the array is non-zero, and is 0 when the array is zero.

getIndexes() is the naivest attempt: it checks the bits one by one. It is faster when a few bits are clumped together at the start.

getIndexes2() and getIndexes2_5() check which is the first nonzero bit, and avoid checking zeros unnecessarily.

getIndexes3() uses numpy to calculate all the bit shifts. Is brute force, but is faster on some cases, like when there is a high percentage of ones.

getIndexes4() is similar to getIndexes3(), but uses list comprehension to find the non zero bits.

from numpy.random import default_rng
import time
import numpy as np


class sparseArray():
    _indexes = np.arange(64, dtype=np.uint64)
    _weights = 2**_indexes
    _dtypesInt = [np.uint8, np.uint16, np.uint32, np.uint64]

    def __init__(self, values: np.array, indexes: np.array) -> None:
        self.array = values
        self.binaryPattern = int(np.sum(self._weights[indexes]))
        # print(bin(self.binaryPattern)[2:])

    def getIndexes(self):
        answer = np.zeros(len(self.array), dtype=np.uint8)
        binaryMask = self.binaryPattern
        n = 0
        index = 0
        while binaryMask > 0:
            if binaryMask & 1:
                answer[n] = index
                n += 1

            binaryMask = binaryMask >> 1
            index += 1
        return answer

    def getIndexes2(self):
        answer = np.zeros(len(self.array), dtype=np.uint8)
        binaryMask = int(self.binaryPattern)
        n = 0
        index = -1
        while binaryMask > 0:
            # isolates first nonzero bit
            first1 = binaryMask & ~(binaryMask - 1)
            index += first1.bit_length()
            answer[n] = index
            n += 1
            binaryMask //= 2*first1
        return answer

    not1 = ~np.array([1], dtype=np.uint64)[0]

    def getIndexes2_5(self):
        answer = np.zeros(len(self.array), dtype=np.uint8)
        binaryMask = int(self.binaryPattern)
        indexes = self._indexes
        n = 0
        index = -1
        while binaryMask > 0:
            # isolates first zero bit
            first0 = ~(binaryMask | (~binaryMask - 1))
            # isolates first nonzero bit
            first1 = binaryMask & ~(binaryMask - 1)
            # if the sequence of 0 is larger than the sequence of 1...
            if first1*4 > first0:
                index += first1.bit_length()
                answer[n] = index
                n += 1
                binaryMask //= 2*first1
            else:
                length1 = first0.bit_length()-1
                index += 1
                answer[n:n+length1] = indexes[index:index+length1]
                index += length1-1
                n += length1
                binaryMask //= first0
        return answer

    def getIndexes3(self):
        bitshift = ((self.binaryPattern//self._weights) & 1) == 1
        return self._indexes[bitshift]

    def getIndexes3_5(self):
        lengthBP = self.binaryPattern.bit_length()
        # creates a boolean array=True on bits=1
        bitshift = ((self.binaryPattern//self._weights[:lengthBP]) & 1) == 1
        return self._indexes[:lengthBP][bitshift]

    def getIndexes4(self):
        i = [x == "1" for x in bin(x.binaryPattern)[-1:1:-1]]
        return self._indexes[:len(i)][i]


rng = default_rng()
length = 32 #how many ones will have the binaryPattern
indexes = np.sort(rng.choice(64, size=length, replace=False))

#getIndexes is faster for this type of data, which may be frequent:
#x = sparseArray(np.array([1, 4, 8, 10, 10]), indexes=np.array([0, 2, 4, 5, 7]))

x = sparseArray(np.arange(length), indexes=indexes)
print(bin(x.binaryPattern)[2:])
print(x.getIndexes())
print(x.getIndexes2())
print(x.getIndexes2_5())
print(x.getIndexes3())
print(x.getIndexes3_5())
print(x.getIndexes4())

start = time.time()
for _ in range(100000):
    x.getIndexes()
print(f"time for getIndexes: {time.time()-start:.2f}")

start = time.time()
for _ in range(100000):
    x.getIndexes2()
print(f"time for getIndexes2: {time.time()-start:.2f}")

start = time.time()
for _ in range(100000):
    x.getIndexes2_5()
print(f"time for getIndexes2_5: {time.time()-start:.2f}")

start = time.time()
for _ in range(100000):
    x.getIndexes3()
print(f"time for getIndexes3: {time.time()-start:.2f}")

start = time.time()
for _ in range(100000):
    x.getIndexes3_5()
print(f"time for getIndexes3_5: {time.time()-start:.2f}")

start = time.time()
for _ in range(100000):
    x.getIndexes4()
print(f"time for getIndexes4: {time.time()-start:.2f}")
Colim
  • 1,283
  • 3
  • 11
  • Ha, I wanted something like that for a Wordle related problem just two days ago. Could you please show your outputs? – Kelly Bundy Aug 08 '22 at 22:42

5 Answers5

2

Not sure if it has to use numpy or how fast it ends up to be but you could try to split the pattern into eight 8-bits blocks, use table look-up to decode (in parallel?) each block and concatenate all the results. Perhaps like:

t0 = [[j+0*8 for j in range(8) if i >> j & 1 == 1] for i in range(256)]
t1 = [[j+1*8 for j in range(8) if i >> j & 1 == 1] for i in range(256)]
t2 = [[j+2*8 for j in range(8) if i >> j & 1 == 1] for i in range(256)]
t3 = [[j+3*8 for j in range(8) if i >> j & 1 == 1] for i in range(256)]
t4 = [[j+4*8 for j in range(8) if i >> j & 1 == 1] for i in range(256)]
t5 = [[j+5*8 for j in range(8) if i >> j & 1 == 1] for i in range(256)]
t6 = [[j+6*8 for j in range(8) if i >> j & 1 == 1] for i in range(256)]
t7 = [[j+7*8 for j in range(8) if i >> j & 1 == 1] for i in range(256)]

def indices(pattern):
    p0 = pattern >> 0*8 & 255
    p1 = pattern >> 1*8 & 255
    p2 = pattern >> 2*8 & 255
    p3 = pattern >> 3*8 & 255
    p4 = pattern >> 4*8 & 255
    p5 = pattern >> 5*8 & 255
    p6 = pattern >> 6*8 & 255
    p7 = pattern >> 7*8 & 255
    return t0[p0] + t1[p1] + t2[p2] + t3[p3] + t4[p4] + t5[p5] + t6[p6] + t7[p7]

binaryPattern = 0b0000000000000000000000000000000000000000000000000000000000001010

print(indices(binaryPattern))
Ecir Hana
  • 10,864
  • 13
  • 67
  • 117
1

This variation of getIndexes3 seems to be almost twice as fast:

    def getIndexesK(self):
        bitshift = self.binaryPattern & self._weights == self._weights
        return self._indexes[bitshift]
Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
1

This is a benchmark of all solutions, including Kelly Bundy, and Ecir Hana.

Kelly Bundi gets second best for more than 15 ones in the pattern, but Ecir Hana beats everything by a ridiculous margin, on all quantities of ones.

Thanks to both, for great code.

benchmark

This is the code used for benchmarking

from numpy.random import default_rng
import time
import numpy as np
import matplotlib.pyplot as plt

# arrays for sparsing the binaryPattern
range8 = range(8)
range256 = range(256)
t0 = [[j+0*8 for j in  range8 if i >> j & 1 == 1] for i in  range256]
t1 = [[j+1*8 for j in  range8 if i >> j & 1 == 1] for i in  range256]
t2 = [[j+2*8 for j in  range8 if i >> j & 1 == 1] for i in  range256]
t3 = [[j+3*8 for j in  range8 if i >> j & 1 == 1] for i in  range256]
t4 = [[j+4*8 for j in  range8 if i >> j & 1 == 1] for i in  range256]
t5 = [[j+5*8 for j in  range8 if i >> j & 1 == 1] for i in  range256]
t6 = [[j+6*8 for j in  range8 if i >> j & 1 == 1] for i in  range256]
t7 = [[j+7*8 for j in  range8 if i >> j & 1 == 1] for i in  range256]


class sparseArray():
    _indexes = np.arange(64, dtype=np.uint64)
    _weights = 2**_indexes
    _dtypesInt = [np.uint8, np.uint16, np.uint32, np.uint64]

    def __init__(self, values: np.array, indexes: np.array) -> None:
        self.array = values
        self.binaryPattern = int(np.sum(self._weights[indexes]))
        # print(bin(self.binaryPattern)[2:])

    def getIndexes(self):
        answer = np.zeros(len(self.array), dtype=np.uint8)
        binaryMask = self.binaryPattern
        n = 0
        index = 0
        while binaryMask > 0:
            if binaryMask & 1:
                answer[n] = index
                n += 1

            binaryMask = binaryMask >> 1
            index += 1
        return answer

    def getIndexes2(self):
        answer = np.zeros(len(self.array), dtype=np.uint8)
        binaryMask = int(self.binaryPattern)
        n = 0
        index = -1
        while binaryMask > 0:
            # isolates first nonzero bit
            first1 = binaryMask & ~(binaryMask - 1)
            index += first1.bit_length()
            answer[n] = index
            n += 1
            binaryMask //= 2*first1
        return answer

    def getIndexes2_5(self):
        answer = np.zeros(len(self.array), dtype=np.uint8)
        binaryMask = int(self.binaryPattern)
        indexes = self._indexes
        n = 0
        index = -1
        while binaryMask > 0:
            # isolates first zero bit
            first0 = ~(binaryMask | (~binaryMask - 1))
            # isolates first nonzero bit
            first1 = binaryMask & ~(binaryMask - 1)
            # if the sequence of 0 is larger than the sequence of 1...
            if first1*4 > first0:
                index += first1.bit_length()
                answer[n] = index
                n += 1
                binaryMask //= 2*first1
            else:
                length1 = first0.bit_length()-1
                index += 1
                answer[n:n+length1] = indexes[index:index+length1]
                index += length1-1
                n += length1
                binaryMask //= first0
        return answer

    def getIndexes3(self):
        bitshift = ((self.binaryPattern//self._weights) & 1) == 1
        return self._indexes[bitshift]

    def getIndexes3_5(self):
        lengthBP = self.binaryPattern.bit_length()
        # creates a boolean array=True on bits=1
        bitshift = ((self.binaryPattern//self._weights[:lengthBP]) & 1) == 1
        return self._indexes[:lengthBP][bitshift]

    def getIndexes4(self):
        i = [x == "1" for x in bin(x.binaryPattern)[-1:1:-1]]
        return self._indexes[:len(i)][i]

    def getIndexesKBundy(self):
        bitshift = self.binaryPattern & self._weights == self._weights
        return self._indexes[bitshift]

    def getIndexesEHana(self):
        pattern = self.binaryPattern
        p0 = pattern >> 0*8 & 255
        p1 = pattern >> 1*8 & 255
        p2 = pattern >> 2*8 & 255
        p3 = pattern >> 3*8 & 255
        p4 = pattern >> 4*8 & 255
        p5 = pattern >> 5*8 & 255
        p6 = pattern >> 6*8 & 255
        p7 = pattern >> 7*8 & 255
        return np.frombuffer(bytes((t0[p0] + t1[p1] + t2[p2] + t3[p3] + t4[p4] + t5[p5] + t6[p6] + t7[p7])), np.int8)

rng = default_rng()
oneBits = list(range(1, 64))

# dummy sparseArray to extract functions names
x = sparseArray(np.zeros(1), np.zeros(1).astype(np.uint8))
functions = [x.getIndexes, x.getIndexes2, x.getIndexes2_5,
             x.getIndexes3, x.getIndexes3_5, x.getIndexes4,
             x.getIndexesKBundy, x.getIndexesEHana]
duration = {func.__name__: np.zeros(64) for func in functions}

for length in oneBits:
    indexes = np.sort(rng.choice(64, size=length, replace=False))

    #x = sparseArray(np.array([1, 4, 8, 10, 10]), indexes=np.array([0, 2, 4, 5, 7]))
    x = sparseArray(np.arange(length), indexes=indexes)

    functions = [x.getIndexes, x.getIndexes2, x.getIndexes2_5,
                 x.getIndexes3, x.getIndexes3_5, x.getIndexes4,
                 x.getIndexesKBundy, x.getIndexesEHana]

    print(bin(x.binaryPattern)[2:])
    for func in functions:
        print(func())

    print(f"{length=}")
    for func in functions:
        start = time.time()
        for _ in range(100000):
            func()
        dT = time.time()-start
        duration[func.__name__][length] = dT
        print(f"time for {func.__name__}: {dT:.2f}")

for fun, timme in duration.items():
    plt.plot(x._indexes[timme != 0], timme[timme != 0],
             label=fun, linewidth=2, marker="o")
    plt.legend()

plt.xlabel("number of 1 in pattern")
plt.ylabel("time")
plt.show()

I added Kely BUndi's version of Ecir hana solution.

This is the result with debugging:

enter image description here

this is without debugging

enter image description here

Colim
  • 1,283
  • 3
  • 11
  • 1
    Try replacing Ecir's `(t0[p0] + t1[p1] +...)` with `[*t0[p0], *t1[p1], ...]`. – Kelly Bundy Aug 09 '22 at 06:36
  • Interestingly, when I run your benchmark, Ecir's solution eventually becomes as "slow" as my original solution (see benchmark plot in my new answer), while for you, it stayed far ahead. I'd like to see a plot like mine (i.e., with just those three fast solutions) with times from your machine... – Kelly Bundy Aug 09 '22 at 07:42
  • I have an 2008 PC. Maybe you have newer instructions? Did you installed a BLAS to accelerate your numpy? – Colim Aug 09 '22 at 07:46
  • 2008? Oh wow :-). I'm not sure that properly reflects the speeds on normal-age hardware. I don't have a functioning PC right now, I ran this on replit.com (with my paid account, where it's relatively fast). Don't know whether they use BLAS. – Kelly Bundy Aug 09 '22 at 07:51
  • Kelly, you are truly a master. – Colim Aug 09 '22 at 08:35
  • In `getIndexesEHana`, I now see that after the result is assembled (as list) you convert it to `bytes`. You could also try to remove that conversion, i.e. use `bytes` on `t0`, `t1`, ... from the beginning. – Ecir Hana Aug 09 '22 at 11:01
  • @EcirHana Yeah, that actually gave me the motivation/idea for my string version of your solution. You can see I do precompute `bytes` from the lists. And further precompute `str` because sadly [there are no binary f-strings](https://stackoverflow.com/a/45361056/12671057). – Kelly Bundy Aug 09 '22 at 11:17
1

Faster one based on Ecir's, using strings and an f-string instead of lists and adding them.

s0, s1, s2, s3, s4, s5, s6, s7 = [
    [bytes([j+k*8 for j in  range8 if i >> j & 1 == 1]).decode() for i in  range256]
    for k in range(8)
]

...

    def getIndexesEHanaKBundy(self):
        pattern = self.binaryPattern
        p0 = pattern >> 0*8 & 255
        p1 = pattern >> 1*8 & 255
        p2 = pattern >> 2*8 & 255
        p3 = pattern >> 3*8 & 255
        p4 = pattern >> 4*8 & 255
        p5 = pattern >> 5*8 & 255
        p6 = pattern >> 6*8 & 255
        p7 = pattern >> 7*8 & 255
        return np.frombuffer(f'{s0[p0]}{s1[p1]}{s2[p2]}{s3[p3]}{s4[p4]}{s5[p5]}{s6[p6]}{s7[p7]}'.encode(), np.int8)

Using Colim's answer's benchmark:

benchmark plot

Kelly Bundy
  • 23,480
  • 7
  • 29
  • 65
1

You've already accepted the other answer, so I'll just leave it as is but the problem with it is that t0[p0] + ... allocates new lists even if they are not necessary. You could also try this, hopefully it ends up a bit faster:

t0 = [[j+0*16 for j in range(16) if i >> j & 1 == 1] for i in range(65536)]
t1 = [[j+1*16 for j in range(16) if i >> j & 1 == 1] for i in range(65536)]
t2 = [[j+2*16 for j in range(16) if i >> j & 1 == 1] for i in range(65536)]
t3 = [[j+3*16 for j in range(16) if i >> j & 1 == 1] for i in range(65536)]

def indices(pattern):
    p0 = pattern >> 0*16 & 65535
    p1 = pattern >> 1*16 & 65535
    p2 = pattern >> 2*16 & 65535
    p3 = pattern >> 3*16 & 65535
    result = []
    result.extend(t0[p0])
    result.extend(t1[p1])
    result.extend(t2[p2])
    result.extend(t3[p3])
    return result

binaryPattern = 0b0000000000000000000000000000000000000000000000000000000000001010

print(indices(binaryPattern))

Edit: Now I see Kelly's comment, "(t0[p0] + t1[p1] +...) with [*t0[p0], *t1[p1], ...]" might be worthwhile as well. Thanks Kelly!

Ecir Hana
  • 10,864
  • 13
  • 67
  • 117
  • Calling list methods might actually make it slower. The unpacking notation achieves the same thing but is faster. (When I measured it, it was roughly in the middle between your original and my string version of it). – Kelly Bundy Aug 09 '22 at 09:58
  • @KellyBundy I don't have numpy so I cannot compare but do you happen to know if `bytearray` (instead of list) helps? – Ecir Hana Aug 09 '22 at 10:06
  • It probably would, yes. And `+=` instead of `.extend()` might be good for lists and bytearrays. I can't predict how it would compare to my string version ... The combining would probably be slower than the f-string, but we wouldn't need the `.encode()` (but maybe still a conversion from bytearray to bytes, not sure). – Kelly Bundy Aug 09 '22 at 10:26
  • 1
    I now modified Colim's version of your original to use unpacking or extending (i.e., still using eight blocks). [Results](https://i.stack.imgur.com/Vjifb.png). – Kelly Bundy Aug 09 '22 at 10:34