Fast way to find max number of consecutive 1-bits in million-bit numbers

Question

For example, 123456789 in binary is 111010110111100110100010101. The max number of consecutive 1-bits is 4. I got interested in solving that efficiently for very large numbers (million bits or even more). I came up with this:

def onebits(n):
    ctr = 0
    while n:
        n &= n >> 1
        ctr += 1
    return ctr

The n &= n >> 1 simultaneously cuts off the top 1-bit of every streak of consecutive 1-bits. I repeat that until each streak is gone, counting how many steps it took. For example (all binary):

   11101011 (start value)
->  1100001
->   100000
->        0

It took three steps because the longest streak had three 1-bits.

For random numbers, where streaks are short, this is pretty fast (Kelly3 in that benchmark). But for numbers with long streaks, it takes up to O(b²) time, where b is the size of n in bits. Can we do better?

Kelly Bundy · Answer 1 · 2023-03-04T15:40:49.070

Yes, we can do it in O(b log b) time by developing that idea further. With an exponential search.

Note that by cutting off each streak's top 1-bit, that also widens the gaps between streaks. Originally, streaks were separated by at least one 0-bit. After cutting off each streak's first 1-bit, streaks are now separated by at least two 0-bits.

We can then do n &= n >> 2 to cut off the first two 1-bits of all remaining streaks. Which also widens the gaps to at least four 0-bits.

We continue cutting off 4, 8, 16, 32, etc 1-bits from the start of each streak, as long as we still have any 1-bit streaks remaining.

Let's say when trying to cut off 32, we find that we have no streak left. At this point we switch to reverse mode. Try cutting off 16 instead. Then 8, 4, 2, and finally 1. But only keep the cuts that still leave us a streak.

Since we only kept cuts that left some streak, we end up with a streak (or streaks) of length 1, so we add that to the total.

The code:

def onebits_linearithmic(n):
    if n == 0:
        return 0
    total = 0
    cut = 1
    while m := n & (n >> cut):
        n = m
        total += cut
        cut *= 2
    while cut := cut // 2:
        if m := n & (n >> cut):
            n = m
            total += cut
    return total + 1

Benchmark with random 1,000,000-bit numbers:

  0.60 ± 0.06 ms  onebits_linearithmic
  1.13 ± 0.08 ms  onebits_quadratic
 48.57 ± 1.25 ms  onebits_linear

I included a linear-time solution using strings, but its hidden constant is much higher, so it's still much slower.

Next, random 100,000-bit numbers with 50,000-bit streak of 1-bits:

  0.13 ± 0.05 ms  onebits_linearithmic
  2.37 ± 0.60 ms  onebits_linear
176.23 ± 7.82 ms  onebits_quadratic

The quadratic solution indeed became much slower. The other two remain fast, so let's try them with random 1,000,000-bit numbers with 500,000-bit streak of 1-bits:

  1.36 ± 0.06 ms  onebits_linearithmic
 24.69 ± 0.84 ms  onebits_linear

Full code (Attempt This Online!):

def onebits_quadratic(n):
    ctr = 0
    while n:
        n &= n >> 1
        ctr += 1
    return ctr

def onebits_linearithmic(n):
    if n == 0:
        return 0
    total = 0
    cut = 1
    while m := n & (n >> cut):
        n = m
        total += cut
        cut *= 2
    while cut := cut // 2:
        if m := n & (n >> cut):
            n = m
            total += cut
    return total + 1

def onebits_linear(n):
    return max(map(len, f'{n:b}'.split('0')))

funcs = onebits_quadratic, onebits_linearithmic, onebits_linear

import random
from timeit import repeat
from statistics import mean, stdev

# Correctness
for n in [*range(10000), *(random.getrandbits(1000) for _ in range(1000))]:
  expect = funcs[0](n)
  for f in funcs[1:]:
    if f(n) != expect:
      print('fail:', f.__name__)
    
def test(bits, number, unit, scale, long_streak, funcs):
  if not long_streak:
    print(f'random {bits:,}-bit numbers:')
  else:
    print(f'random {bits:,}-bit numbers with {bits//2:,}-bit streak of 1-bits:')

  times = {f: [] for f in funcs}
  def stats(f):
    ts = [t * scale for t in times[f][5:]]
    return f'{mean(ts):6.2f} ± {stdev(ts):4.2f} {unit} '

  for _ in range(10):
    n = random.getrandbits(bits)
    if long_streak:
      n |= ((1 << (bits//2)) - 1) << (bits//4)
    for f in funcs:
      t = min(repeat(lambda: f(n), number=number)) / number
      times[f].append(t)

  for f in sorted(funcs, key=stats):
    print(stats(f), f.__name__)
  print()

test(1_000_000, 1, 'ms', 1e3, False, funcs)
test(100_000, 1, 'ms', 1e3, True, funcs)
test(1_000_000, 1, 'ms', 1e3, True, funcs[1:])

About 20% speedup- `return len(max(f'{n:b}'.translate(str.maketrans({"0": " "})).split(), default=""))`. Replaces "0"s with spaces, so split() discards them in blocks, instead of creating a bunch of empty strings in between. Also saves function call overhead of mapping `len`, at the cost of str comparisons, but the time remains linear due to early exit on str compare, and in practice is fast because python interns strings and can exit the comparison early on exact equality. — Dillon Davis, Mar 04 '23 at 08:16
@DillonDavis Thanks. Now I'm torn. In a previous edit I already replaced a faster one with this nicer one for simplicity, as both are far slower than the bit fiddler, so it doesn't really matter. And I just noticed I forgot to remove the default, so I can make it even nicer. But... yours is indeed even faster. An even faster linear solution would be a hybrid, doing a fixed number of bit fiddler rounds, then the string solution. I really don't want to go *that* far, but I might use yours. I'll play a bit with it now. — Kelly Bundy, Mar 04 '23 at 10:02
@DillonDavis What *"early exit on str compare"* do you mean? The identity check that you mentioned afterwards, or something else? I think you're mistaken about interning, only single-character ones are interned ([demo](https://ato.pxeger.com/run?1=m72soLIkIz9vwYKlpSVpuhY3I4p1FEoUbBXUDRUM1fWKC3IySzQ0uQqKMvNKNIoVbG0VSnQUihUyixVKNLm44GqBigmphpgPtQZmHQA)), and even that's irrelevant since `max` will quickly find a longer one and use that to compare with. — Kelly Bundy, Mar 04 '23 at 10:06
@DillonDavis With these additional variations, it also more and more depends on not just the total number of bits but also on what the streaks are. For example, yours is 3x *slower* than my string one when there are 1000 streaks of 1000 1-bits, separated by single 0-bits (`n = int('0'.join(['1'*1000]*1000), 2)`). — Kelly Bundy, Mar 04 '23 at 10:56
Good catch with the string interning- you're right. By early exit, I mean if its comparing a 200 char str to a 3 chr string, it only iterates the first 3+1 chars, so it wouldn't grow quadratic ally if say it found the longest str first. Its probably not worth the cost, given your last comment, but the '0' -> ' ' + .split() trick may still be worth it. — Dillon Davis, Mar 04 '23 at 18:50
@DillonDavis Yeah, I'm thinking of writing a separate answer with yours and other linear time variations and also the 1000-streaks-of-1000 testcase. Too many variations, I do want to see them in an answer, but in this one I would prefer not to distract from its main solution. — Kelly Bundy, Mar 04 '23 at 19:01
@DillonDavis Or you could post it as an answer :-). You have multiple optimizations, would be nice to show the effect of each one. — Kelly Bundy, Mar 04 '23 at 19:03

dankal444 · Answer 2 · 2023-03-04T15:10:33.680

Found your problem interesting.

I checked simple, loop-based approach. It is O(b). Becasue of excesive use of loops, had to use numba to achieve similar results in terms of timings.

import math
import numba

def loop_based(n: int):
    bytes_array = (n).to_bytes(int(math.log2(n+1) // 8 + 1), "big")
    return optimized_part(bytes_array)


@numba.njit
def optimized_part(bytes_array):
    bits = [1, 2, 4, 8, 16, 32, 64, 128]
    longest = 0
    current = 0
    idx = len(bytes_array) - 1
    while idx >= 0:
        current_byte = bytes_array[idx]
        for bit in bits:
            if bit & current_byte:
                current += 1
            else:
                if current > longest:
                    longest = current
                current = 0
        idx -= 1
    if current > longest:
        longest = current
    return longest

Results

random 1,000,000-bit numbers:
  0.38 ± 0.01 ms  onebits_linearithmic
  0.66 ± 0.01 ms  loop_based
  0.77 ± 0.08 ms  onebits_quadratic
 26.89 ± 0.40 ms  onebits_linear
 51.06 ± 0.43 ms  loop_based (without numba)

random 100,000-bit numbers with 50,000-bit streak of 1-bits:
  0.07 ± 0.00 ms  onebits_linearithmic
  0.07 ± 0.02 ms  loop_based
  0.99 ± 0.02 ms  onebits_linear
  4.99 ± 0.06 ms  loop_based (without numba)
109.81 ± 0.43 ms  onebits_quadratic

random 1,000,000-bit numbers with 500,000-bit streak of 1-bits:
  0.65 ± 0.01 ms  loop_based
  0.82 ± 0.02 ms  onebits_linearithmic
 13.45 ± 0.09 ms  onebits_linear
 49.56 ± 0.56 ms  loop_based (without numba)

It could be optimized for random bits case, by:

jumping n_current_longest_bits_streak ahead,
iterating backwards
breaking this backwards loop when 0 found.

Out of curiosity, what times do you get with this *without* numba? — Kelly Bundy, Mar 04 '23 at 13:09
Thanks. Somewhat surprising that it's not that much slower than `onebits_linear`. Maybe a more optimized Python loop could actually beat it. Then again, I didn't write it only for it's speed bit also because it's a nice oneliner (which reminds me, I should remove the default now...). — Kelly Bundy, Mar 04 '23 at 15:24

Fast way to find max number of consecutive 1-bits in million-bit numbers

2 Answers2

Linked