What's the optimal implementation of a sliding window over a number's bits?

Question

Given a number i.e (0xD5B8), what is the most efficient way in Python to subset across the bits over a sliding window using only native libraries?

A method might look like the following:

def window_bits(n,w, s):
    '''
    n: the number
    w: the window size
    s: step size
    '''
    # code

window_bits(0xD5B8, 4, 4)  # returns [[0b1101],[0b0101],[0b1011],[0b1000]]
window_bits(0xD5B8, 2, 2)  # returns [[0b11],[0b01],[0b01],[0b01],[0b10],[0b11],[0b10],[0b00]]

Some things to consider:

should strive to use minimal possible memory footprint
can only use inbuilt libraries
as fast as possible.
if len(bin(n)) % w != 0, then the last window should exist, with a size less than w

Some of the suggestions are like How to iterate over a list in chunks, which is convert the int using bin and iterate over as a slice. However, these questions do not prove the optimality. I would think that there are other possible bitwise operations that could be done that are more optimal than running over the bin as a slice (a generic data structure), either from a memory or speed perspective. This question is about the MOST optimal, not about what gets the job done, and it can be considered from memory, speed, or both. Ideally, an answer gives good reasons why their representation is the most optimal.

So, if it is provably the most optimal to convert to bin(x) and then just manage the bits as a slice, then that's the optimal methodology. But this is NOT about an easy way to move a window around bits. Every op and bit counts in this question.

By the way, your output example doesn't exactly match a sliding window. Sliding window usually has overlaps. If you want without overlaps, see [How to iterate over a list in chunks](https://stackoverflow.com/q/434287/6045800) — Tomerikoo, Dec 01 '22 at 11:50
By doing `bin(n)[2:]` you get an array of bits you can feed to any of the answers in the links. And the overlap is not dependent on the window size. It's a matter of the definition if the window moves in a step of window_size or 1. You didn't mention what it is — Tomerikoo, Dec 01 '22 at 12:46
@Tomerikoo I'll modify the function to add a step size parameter. there is some overhead in maintaining an array. and in theory, with my question, there may be faster methods with some bit shifting, etc. I think this was a valid question. For example: https://stackoverflow.com/questions/45220959/python-how-do-i-extract-specific-bits-from-a-byte. You could also use something like this to access the bits. — andor kesselman, Dec 01 '22 at 12:51
obviously I had already considered running over the bits through bins as a standard slice. What I can't do is prove that's actually the optimal way to accomplish that in theory. — andor kesselman, Dec 01 '22 at 12:55
Fair enough. For the record, this should probably all been part of your question to make it clearer (for me and others) — Tomerikoo, Dec 01 '22 at 12:57
sure. i'll modify the question to clarify that a slice iterator is has been considered, but I am unclear about the optimality of it. — andor kesselman, Dec 01 '22 at 12:58
@Tomerikoo i have adjusted the question. Can you re-open this question? — andor kesselman, Dec 01 '22 at 13:11

score 1 · Answer 1 · answered Dec 01 '22 at 13:38

1

The "naive" option would be to create a bits array - bin(n)[2:] - and then use the answers from How to iterate over a list in chunks.

But this is most likely not so efficient assuming we can use bit operations. Another option is to shift-and-mask the input according to the window and step size:

def window_bits(n, w, step_size):
    offset = n.bit_length() - w  # the initial shift to get the MSB window
    mask = 2**w-1  # To get the actual window we need
    while offset >= 0:
        print(f"{(n >> offset)&mask:x}")
        offset -= step_size  # advance the window

And running window_bits(0xD5B8, 4, 4) will indeed print each nibble on a separate line.

answered Dec 01 '22 at 13:38

Tomerikoo

18,379
16
47
61

yess..this is the direction the question was heading. Thank you for addressing it! I'm also exploring some options on my end, but I like where you went with this! – andor kesselman Dec 01 '22 at 13:43
my apologies if the question was not clear at the beginning. it seems like now there is more clarity. – andor kesselman Dec 01 '22 at 13:44
1

Yes indeed it might not be an exact duplicate considering the bits operations special-case here. Sorry for judging too fast. I didn't actually prove this is more optimal than the answers there though. If I will have more time later I will run some timing – Tomerikoo Dec 01 '22 at 13:47
Ran an initial test, over specifically speed: window_bits 0.035297544207423925s over 10000 iterations chunker 0.024356991983950138s over 10000 iterations at first glance, not faster, i'm guessing less memory required though. Will need to run more tests later. Part of this question is to figure out what actually is faster and why! – andor kesselman Dec 01 '22 at 14:40
@andorkesselman Not sure how you timed the chunker, but note that in my code above there is a `print` inside the loop and if that is taken as part of the calculation then it will skew the results as the `print` in itself takes up some runtime... – Tomerikoo Dec 01 '22 at 14:44
yea..i commented the print out. just ran the necessary ops. but definitely a good call out. it's by no means a definitive benchmark. will need to do some tests later. also there's a small amount of overhead i'm guessing in getting the length of n? and then there's also the while loop I'm curious if the length of n increases, what happens to the benchmarking. – andor kesselman Dec 01 '22 at 17:29
I don't think this handles the final window correctly. I added a final window handler when the # of bits (n) % w !== 0, so that it handles that case correctly. It looks a little ugly, but I added an if condition outside of the loop to run one last bit shift against - offset, with the rationale that inside the loop it would run more ops (n windows) than outside (1) – andor kesselman Dec 03 '22 at 07:45

andor kesselman · Answer 2 · 2022-12-06T21:37:20.817

This is not the full answer yet, as I need to do more research, but wanted to add it to the question.

Here's a modification of Tomerikoo's answer that handles the ends better.

This is the "blue" section of the graph below.

def window_bits(n, w, s):
    offset = n.bit_length() - w  
    mask = 2**w-1  
    ret = []
    while offset >= 0:
        ret.append((n >> offset) & mask)
        offset -= s  # advance the window
    if offset < 0: # close the end
        mask = 2**(-offset)-1
        ret.append((n >> mask) & mask)
    return ret

This, along with the red chunker algo mentioned next, was benchmarked over pytest benchamrk.

The red is the chunker used over the following function:

def chunker(n, size, s=None):
    seq = bin(n)
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))

These were parameterized the same, with the following:

numbers = [2**n for n in range(10)]
window_size = [4, 8, 12]
step_size = window_size

A couple things that stood out to me:

The chunker has much more of an even execution time whereas the window_bit function executes with a lot more variance.
The chunker is just faster in general.

I'm looking into why this might be the case, as it's not clear yet to me if there's something else at play here. I would think that the bit shifting ops would be faster, but maybe there's some optimizations with slicing that's happening that I'm not sure about.

edit: there was a mistake above, and the chunker was returning a generator not actually running things over (duh). Tests show similar results though even with the fix. Thanks @Tomerikoo for the callout.

It should be:

def chunker(n, size, s=None):
    seq = bin(n)
    return ([seq[pos:pos + size] for pos in range(0, len(seq), size)])

Update Posting test code below:

import pytest

def window_bits(n, w, s):
    offset = n.bit_length() - w  # the initial shift to get the MSB window.
    mask = 2**w-1  # To get the actual window we need
    ret = []
    while offset >= 0:
        ret.append(bin((n >> offset) & mask)[2:].zfill(w))
        offset -= s  # advance the window
    if offset < 0: # close the end
        mask = 2**(-offset)-1
        ret.append(bin((n >> mask) & mask)[2:])
    return ret

def chunker(n, size, s=None):
    seq = bin(n)
    return([seq[pos:pos + size] for pos in range(0, len(seq), size)])

ns = [2**n for n in range(10)]
ws = [4, 8, 12]

@pytest.mark.parametrize('n', ns)
@pytest.mark.parametrize('ws', ws)
@pytest.mark.window_bits
def test_window_bits(benchmark, n, ws):
    result = benchmark.pedantic(window_bits, args=(n,ws,ws))
    assert result is not None

@pytest.mark.parametrize('n', ns)
@pytest.mark.parametrize('ws', ws)
@pytest.mark.chunker
def test_chunker(benchmark, n, ws):
    result = benchmark.pedantic(chunker, args=(n,ws,ws))
    assert result is not None

Well, the `chunker` function you wrote above doesn't really do what you think it does. ***it just returns a generator***. It doesn't even loop on the input at all. This explains the constant time. You need to change it to `return [...]` to be comparable — Tomerikoo, Dec 03 '22 at 23:17
somehow I missed that. you're correct. it was a generator being returned. That being said, the relationship still holds. chunker is still way faster. Again, though, not definitive yet. — andor kesselman, Dec 06 '22 at 14:18
I find hard to believe it will stay similar, can you update with the actual code you're using for the test? — Tomerikoo, Dec 06 '22 at 14:25
updated. it's possible it has something to do with the benchmarking itself ( maybe I need to initialize it differently ). I can take a deeper look at it later. I would agree the bit shifting should probably be faster. I tossed this all together without a ton of rigor, so if I have some time, I can revisit this with a lot better of a test suite in place. — andor kesselman, Dec 06 '22 at 21:41
@Tomerikoo seeing as this again got downvoted, I'm probably doing to close or delete this question. I'll probably look for the most efficient implementation, but on my own for my personal edification as this doesn't seem to align with the communities interest... — andor kesselman, Dec 06 '22 at 22:00

What's the optimal implementation of a sliding window over a number's bits?

2 Answers2