Fast way of counting non-zero bits in positive integer

Question

I need a fast way to count the number of bits in an integer in python. My current solution is

bin(n).count("1")

but I am wondering if there is any faster way of doing this?

Related: http://stackoverflow.com/questions/407587/python-set-bits-count-popcount — dusan, Mar 22 '12 at 20:15
What kind of representation are you using if your "integers" are longer than a standard python `int`? Does that not have its own method for calculating this? — Marcin, Mar 22 '12 at 21:02
In Python 3.6+, instead of `bin(n)`, try `f"{n:b}"`. It should be faster, and you don't get that pesky "0b" prefix. Also, you can do things like `f"{n:032b}"` to get zero-padded bitstrings of width 32. — PM 2Ring, Nov 03 '20 at 17:36
@PM2Ring Apparently it's slower. (see my benchmark). -- Also, the 0b prefix or zero-padding doesn't matter if all matters is the number of ones in the string. — user202729, Nov 06 '20 at 09:28
@user202729 Oh, ok. Thanks for doing the benchmark. I assumed the f-string would be faster because it avoids an explicit function call. OTOH, `bin` is a C function, which has less overhead than a Python function call. Sure, the 0b prefix & zero-padding are irrelevant here, I just mentioned those things for readers who may need to know it for other contexts. — PM 2Ring, Nov 06 '20 at 11:47

score 151 · Answer 1 · answered Nov 15 '20 at 18:35

151

Python 3.10 introduces int.bit_count():

>>> n = 19
>>> bin(n)
'0b10011'
>>> n.bit_count()
3
>>> (-n).bit_count()
3

This is functionally equivalent to bin(n).count("1") but should be ~6 times faster. See also Issue29882.

answered Nov 15 '20 at 18:35

Chris_Rands

38,994
14
83
119

17

This should be the accepted answer in a few months! – antonagestam May 20 '21 at 08:51
2

~6 times faster was for small numbers. With the question's 10000+ bits numbers, it's more like ~40 times faster. Would be nice if you could show the speedup at different magnitudes. – Kelly Bundy Oct 07 '22 at 04:10
2

@KellyBundy i've time[d]it: its a lot faster: [graph](https://imgur.com/a/3ar3VTQ) – HannesH Jan 26 '23 at 15:51

kindall · Accepted Answer · 2019-05-07T16:17:29.973

151

For arbitrary-length integers, bin(n).count("1") is the fastest I could find in pure Python.

I tried adapting Óscar's and Adam's solutions to process the integer in 64-bit and 32-bit chunks, respectively. Both were at least ten times slower than bin(n).count("1") (the 32-bit version took about half again as much time).

On the other hand, gmpy popcount() took about 1/20th of the time of bin(n).count("1"). So if you can install gmpy, use that.

To answer a question in the comments, for bytes I'd use a lookup table. You can generate it at runtime:

counts = bytes(bin(x).count("1") for x in range(256))  # py2: use bytearray

Or just define it literally:

counts = (b'\x00\x01\x01\x02\x01\x02\x02\x03\x01\x02\x02\x03\x02\x03\x03\x04'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x04\x05\x05\x06\x05\x06\x06\x07\x05\x06\x06\x07\x06\x07\x07\x08')

Then it's counts[x] to get the number of 1 bits in x where 0 ≤ x ≤ 255.

edited May 07 '19 at 16:17

answered Mar 22 '12 at 22:46

kindall

178,883
35
278
309

8

+1! The converse of this is not accurate, however, it should be stated: `bin(n).count("0")` is not accurate because of the '0b' prefix. Would need to be `bin(n)[2:].count('0')` for those counting naughts.... – the wolf Mar 23 '12 at 02:03
11

You can't really count zero bits without knowing how many bytes you're filling, though, which is problematic with a Python long integer because it could be anything. – kindall Mar 23 '12 at 05:04
2

Although those are fast options for single integers, note that algorithms presented in other answers may be potentially vectorised, thus much faster if run across many elements of a large `numpy` array. – gerrit Nov 24 '15 at 18:22
For numpy arrays I'd look into something like this: https://gist.github.com/aldro61/f604a3fa79b3dec5436a – kindall Feb 23 '16 at 17:34
2

I have used `bin(n).count("1")`. However, only beats 60% of python submission. @ [leetcode](https://leetcode.com/submissions/detail/91593414/) – northtree Feb 04 '17 at 05:48
"arbitrary-length integers" -- What if it is given that the length is, say, 8 bits? – hola Dec 12 '18 at 18:28
1

@pushpen.paul For 8-bit integers I'd just use a lookup table. Could use a bytearray for it for space efficiency. – kindall Dec 12 '18 at 19:44
1

`https://docs.python.org/3.10/library/stdtypes.html#int.bit_count` counts set bits. – Feb 23 '21 at 01:32
I'm using python3.7 and gmpy2, `popcount` use 50% time of `count('1')` – Hunger Sep 01 '21 at 10:58

Adam Zalcman · Answer 3 · 2012-03-22T20:54:08.187

37

You can adapt the following algorithm:

def CountBits(n):
  n = (n & 0x5555555555555555) + ((n & 0xAAAAAAAAAAAAAAAA) >> 1)
  n = (n & 0x3333333333333333) + ((n & 0xCCCCCCCCCCCCCCCC) >> 2)
  n = (n & 0x0F0F0F0F0F0F0F0F) + ((n & 0xF0F0F0F0F0F0F0F0) >> 4)
  n = (n & 0x00FF00FF00FF00FF) + ((n & 0xFF00FF00FF00FF00) >> 8)
  n = (n & 0x0000FFFF0000FFFF) + ((n & 0xFFFF0000FFFF0000) >> 16)
  n = (n & 0x00000000FFFFFFFF) + ((n & 0xFFFFFFFF00000000) >> 32) # This last & isn't strictly necessary.
  return n

This works for 64-bit positive numbers, but it's easily extendable and the number of operations growth with the logarithm of the argument (i.e. linearly with the bit-size of the argument).

In order to understand how this works imagine that you divide the entire 64-bit string into 64 1-bit buckets. Each bucket's value is equal to the number of bits set in the bucket (0 if no bits are set and 1 if one bit is set). The first transformation results in an analogous state, but with 32 buckets each 2-bit long. This is achieved by appropriately shifting the buckets and adding their values (one addition takes care of all buckets since no carry can occur across buckets - n-bit number is always long enough to encode number n). Further transformations lead to states with exponentially decreasing number of buckets of exponentially growing size until we arrive at one 64-bit long bucket. This gives the number of bits set in the original argument.

edited Mar 22 '12 at 20:54

answered Mar 22 '12 at 20:48

Adam Zalcman

26,643
4
71
92

I seriously have no idea how this would work with 10 000 bit numbers though, but I do like the solution. can you give me a hint if and how i can applay that to bigger numbers? – zidarsk8 Mar 22 '12 at 20:52
I didn't see the number of bits you're dealing with here. Have you considered writing your data handling code in a low-level language like C? Perhaps as an extension to your python code? You can certainly improve performance by using large arrays in C compared to large numerals in python. That said, you can rewrite the `CountBits()` to handle 10k-bits numbers by adding just 8 lines of code. But it'll become unwieldy due to huge constants. – Adam Zalcman Mar 22 '12 at 21:05
3

You can write code to generate the sequence of constants, and set up a loop for the processing. – Karl Knechtel Mar 22 '12 at 21:40
This answer has the great advantage that it can be *vectorised* for cases dealing with large `numpy` arrays. – gerrit Nov 24 '15 at 18:28
"the number of operations growth with the logarithm of the argument (i.e. linearly with the bit-size of the argument). " -- This is wrong, the number of (addition, bitwise) operation is (asymptotically) log(log n) or log(number of bits). The time complexity is log(n) log(log n) or (number of bits * log(number of bits)) -- because Python's integer is arbitrary-precision (technically `int` data type in Python 2 is 64 bit, but the OP was using `long` without knowing it) -- so it would almost-likely be slower than gmpy for single numbers. – user202729 Nov 05 '20 at 02:45

score 22 · Answer 4 · edited May 23 '17 at 12:34

22

Here's a Python implementation of the population count algorithm, as explained in this post:

def numberOfSetBits(i):
    i = i - ((i >> 1) & 0x55555555)
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333)
    return (((i + (i >> 4) & 0xF0F0F0F) * 0x1010101) & 0xffffffff) >> 24

It will work for 0 <= i < 0x100000000.

edited May 23 '17 at 12:34

Community

1
1

answered Mar 22 '12 at 20:09

Óscar López

232,561
37
312
386

That's clever. Looking this up instead of shooting an answer from the hip is completely appropriate! – MrGomez Mar 22 '12 at 20:27
1

Did you benchmark this? On my machine using python 2.7, I found this to actually be a bit slower than `bin(n).count("1")`. – David Weldon Mar 22 '12 at 20:53
@DavidWeldon No I didn't, could you please post your benchmarks? – Óscar López Mar 22 '12 at 21:25
`%timeit numberOfSetBits(23544235423)`: `1000000 loops, best of 3: 818 ns per loop`; `%timeit bitCountStr(23544235423)`: `1000000 loops, best of 3: 577 ns per loop`. – gerrit Nov 24 '15 at 15:40
7

However, `numberOfSetBits` processes my 864×64 `numpy.ndarray` in 841 µs. With `bitCountStr` I have to loop explicitly, and it takes 40.7 ms, or almost 50 times longer. – gerrit Nov 24 '15 at 18:32

score 13 · Answer 5 · answered Apr 25 '19 at 00:03

13

I really like this method. Its simple and pretty fast but also not limited in the bit length since python has infinite integers.

It's actually more cunning than it looks, because it avoids wasting time scanning the zeros. For example it will take the same time to count the set bits in 1000000000000000000000010100000001 as in 1111.

def get_bit_count(value):
   n = 0
   while value:
      n += 1
      value &= value-1
   return n

answered Apr 25 '19 at 00:03

Robotbugs

4,307
3
22
30

looks great, but it's only good for very "sparse" integers. on average it's quite slow. Still, it looks really useful in certain use cases. – zidarsk8 Apr 26 '19 at 08:10
1

I'm not quite sure what you mean by "on average it's quite slow". Quite slow compared to what? Do you mean slow compared to some other python code that you're not quoting? It's twice as fast as counting bit by bit for the average number. In fact on my macbook it counts 12.6 million bits per second which is a lot faster than I can count them. If you have another generic python algorithm that works for any length of integer and is faster than this I'd like to hear about it. – Robotbugs Apr 27 '19 at 22:51
1

I do accept that it is actually slower than the answer by Manuel above. – Robotbugs Apr 27 '19 at 23:31
2

Quite slow on average means, counting bits for 10000 numbers with 10000 digits takes 0.15s with `bin(n).count("1")` but it took 3.8s for your function. If the numbers had very few bits set it would work fast, but if you take any random number, on average the function above will be orders of magnitude slower. – zidarsk8 May 07 '19 at 16:47
OK I will accept that. I guess I was just being a dick cos you're a little imprecise but you're totally right. I just hadn't tested the method using the method by Manuel above before my comment. It looks very clunky but it is actually very fast. I'm now using a version like that but with 16 values in the dictionary and that's even much faster than the one he quoted. But for the record I was using mine in an application with only a few bits that were set to 1. But for totally random bits yeah it's going to about 50:50 with a little variance decreasing with length. – Robotbugs May 28 '19 at 07:05
Also thanks for taking your time to actually type in and profile the function i quoted. That's appreciated. – Robotbugs May 28 '19 at 07:12

Paolo Moretti · Answer 6 · 2012-03-22T22:45:57.090

11

According to this post, this seems to be one the fastest implementation of the Hamming weight (if you don't mind using about 64KB of memory).

#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE16 = [0] * 2**16
for index in range(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])

On Python 2.x you should replace range with xrange.

Edit

If you need better performance (and your numbers are big integers), have a look at the GMP library. It contains hand-written assembly implementations for many different architectures.

gmpy is A C-coded Python extension module that wraps the GMP library.

>>> import gmpy
>>> gmpy.popcount(2**1024-1)
1024

edited Mar 22 '12 at 22:45

answered Mar 22 '12 at 20:19

Paolo Moretti

54,162
23
101
92

I have edited my question to make it clear I need this for Large numbers (10k bits and more). optimizing something for 32 bit integers would probablt not make that much of a difference since the number of counts would have to be really big, in which case that would cause the slow execute time. – zidarsk8 Mar 22 '12 at 20:54
But GMP is exactly for very large numbers, including numbers at and far beyond the sizes you mention. – James Youngman Mar 23 '12 at 10:45
1

Memory usage will be better if you use [`array.array`](https://docs.python.org/2/library/array.html) for `POPCOUNT_TABLE16`, as then it'll be stored as an array of integers, instead of as a dynamically sized list of Python `int` objects. – gsnedders Dec 25 '14 at 23:14

score 4 · Answer 7 · answered Aug 31 '17 at 06:17

4

You can use the algorithm to get the binary string [1] of an integer but instead of concatenating the string, counting the number of ones:

def count_ones(a):
    s = 0
    t = {'0':0, '1':1, '2':1, '3':2, '4':1, '5':2, '6':2, '7':3}
    for c in oct(a)[1:]:
        s += t[c]
    return s

[1] https://wiki.python.org/moin/BitManipulation

answered Aug 31 '17 at 06:17

Manuel

61
5

This works fast. There's an error, at least on p3, the [1:] should be [2:] because oct() returns '0o' before the string. The code runs a lot faster though if you use hex() instead of oct() and make a 16 entry dictionary, – Robotbugs Apr 27 '19 at 23:20
You can avoid having to chop off the first two characters by using `"%o" % a` instead of `oct(a)`. (`%x` % a` for the suggested hexadecimal improvement) – kindall Jul 20 '23 at 13:50

user202729 · Answer 8 · 2020-11-06T13:38:21.913

It's possible to combine a lookup table with int.to_bytes (Python 3 only):

popcount8bit = bytes([popcount(x) for x in range(1<<8)])  # use any method to initialize this lookup table
popcount = lambda x: sum(map(popcount8bit.__getitem__,
                             x.to_bytes((x.bit_length()+7)//8, "little")))

This solution unfortunately is about 20% slower than bin(x).count('1') on Python 3, but twice faster on PyPy3.

This is a benchmark script, compares several different solutions presented here, for different number of bits:

from __future__ import print_function  #for Python 2

import sys
from timeit import timeit
import random

def popcount(x): return bin(x).count("1")

version3=sys.version.startswith("3")

for numBit in (2, 4, 8, 16, 31, 32, 63, 64, 1000, 10000):
    maximum=int((1<<numBit)-1)  #int cast just in case it overflows to long in Python 2

    functions=[
            (popcount, "bin count"),
            (lambda x: "{:b}".format(x).count("1"), "format string count"),
            ]

    try:
        import gmpy
        functions.append((gmpy.popcount, "gmpy"))
    except ImportError:
        pass

    if sys.version.startswith("3"):
        exec('''functions.append((lambda x: f"{x:b}".count("1"), "f-string count"))''')

    if numBit<=16:
        table1=[popcount(x) for x in range(maximum+1)]
        functions.append((lambda x: table1[x], "lookup list"))
        functions.append((table1.__getitem__, "lookup list without lambda"))

        table2="".join(map(chr, table1))
        functions.append((lambda x: ord(table2[x]), "lookup str"))

        if version3:
            table3=bytes(table1)
            functions.append((lambda x: table3[x], "lookup bytes"))

            if numBit==8:
                functions.append((
                        b'\x00\x01\x01\x02\x01\x02\x02\x03\x01\x02\x02\x03\x02\x03\x03\x04'
                        b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
                        b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
                        b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
                        b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
                        b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
                        b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
                        b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
                        b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
                        b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
                        b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
                        b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
                        b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
                        b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
                        b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
                        b'\x04\x05\x05\x06\x05\x06\x06\x07\x05\x06\x06\x07\x06\x07\x07\x08'
                        .__getitem__, "lookup bytes hard coded 8 bit"))
                table_hardcoded=(
                        b'\x00\x01\x01\x02\x01\x02\x02\x03\x01\x02\x02\x03\x02\x03\x03\x04'
                        b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
                        b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
                        b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
                        b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
                        b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
                        b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
                        b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
                        b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
                        b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
                        b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
                        b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
                        b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
                        b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
                        b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
                        b'\x04\x05\x05\x06\x05\x06\x06\x07\x05\x06\x06\x07\x06\x07\x07\x08')
                functions.append((
                        table_hardcoded.__getitem__, "lookup bytes hard coded 8 bit local variable"))
            functions.append((table3.__getitem__, "lookup bytes without lambda"))

    if version3:
        popcount8bit=bytes([popcount(x) for x in range(1<<8)]) #bytes because benchmark says that it's fastest
        functions.append((
            lambda x: sum(popcount8bit[x] for x in x.to_bytes((x.bit_length()+7)//8, "big")),
            "to_bytes"
            ))
        functions.append((
            lambda x: sum(map(popcount8bit.__getitem__, x.to_bytes((x.bit_length()+7)//8, "big"))),
            "to_bytes without list comprehension"
            ))
        functions.append((
            lambda x: sum(map(popcount8bit.__getitem__, x.to_bytes((x.bit_length()+7)//8, "little"))),
            "to_bytes little endian, without list comprehension"
            ))

    #for x in (2, 4, 8, 16, 32, 64):
    #   table1=[popcount(x) for x in range(1<<8)]


    print("====== numBit=", numBit)
    data=[]
    numRepeat=10**7//(numBit+100)
    for popcountFunction, description in functions:
        random.seed(10) #make randint returns the same value
        data.append((
            timeit(lambda: popcountFunction(random.randint(0, maximum)), number=numRepeat),
            description
            ))

    time1, name1=data[0]
    assert name1=="bin count"
    data.sort()
    maxLength=0
    for time, description in data:
        maxLength=max(maxLength, len(description))
    for time, description in data:
        print("{:{}} -> {:2f} = {} * {:2f}".format(description, maxLength+2, time, name1, time/time1))

It works with both Python 2 and 3; however, if a solution is unavailable for Python 2, it's not measured.

Some solutions are not listed here.

Result:

Python 2: "lookup list without lambda" is the fastest (25% faster than "bin count", 6% faster than "lookup list" (with lambda)) for <= 16 bits, larger than that "bin count" is the fastest. (I didn't install gmpy for Python 2)
Python 3: Roughly the same result.
- "Lookup bytes without lambda" is comparable (+/-2% compared to "lookup list without lambda").
- gmpy is faster than "bin count` in all cases, but slower than "lookup list without lambda" for about 5% with numBit <= 16.
- "to_bytes" is comparable.
- Using f-string is about 10% slower than "bin count".
PyPy3: The lambda no longer incurs much cost, and the to_bytes version becomes much faster (twice faster than "bin count"); however, I could not get gmpy to install.

On second thoughts, the speeds may vary a fair bit, depending on the environment. I just tried your benchmark script in Python 3.7 on [SageMathCell](https://sagecell.sagemath.org/) and f-strings were mostly faster. But I was running in Sage mode. (It throws an error in plain Python mode, but that mode isn't really pure Python and it has flaky handling of string literals, eg you have to use `"\\n"` to get a newline). — PM 2Ring, Nov 06 '20 at 12:34
It'd be interesting to see times without an extra layer of function calls, where applicable. Here's a short example of timing expressions rather than functions: https://stackoverflow.com/a/50212230 BTW, it's a good idea to do a few (3-5) repetitions of timeit loops, and use the minimum one, as I discuss here https://stackoverflow.com/a/43678107/4014959 — PM 2Ring, Nov 06 '20 at 12:50

score 2 · Answer 9 · answered May 03 '14 at 06:01

You said Numpy was too slow. Were you using it to store individual bits? Why not extend the idea of using ints as bit arrays but use Numpy to store those?

Store n bits as an array of ceil(n/32.) 32-bit ints. You can then work with the numpy array the same (well, similar enough) way you use ints, including using them to index another array.

The algorithm is basically to compute, in parallel, the number of bits set in each cell, and them sum up the bitcount of each cell.

setup = """
import numpy as np
#Using Paolo Moretti's answer http://stackoverflow.com/a/9829855/2963903
POPCOUNT_TABLE16 = np.zeros(2**16, dtype=int) #has to be an array

for index in range(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])

def count1s(v):
    return popcount32_table16(v).sum()

v1 = np.arange(1000)*1234567                       #numpy array
v2 = sum(int(x)<<(32*i) for i, x in enumerate(v1)) #single int
"""
from timeit import timeit

timeit("count1s(v1)", setup=setup)        #49.55184188873349
timeit("bin(v2).count('1')", setup=setup) #225.1857464598633

Though I'm surprised no one suggested you write a C module.

score 1 · Answer 10 · answered Mar 24 '22 at 22:22

1

class popcount_lk:
    """ Creates an instance for calculating the population count of
        bitstring, based on a lookup table of 8 bits. """

    def __init__(self):
        """ Creates a large lookup table of the Hamming weight of every 8 bit integer. """
        self.lookup_table = bytes.maketrans(bytes(range(1<<8)),bytes((bin(i).count('1') for i in range(1<<8))))
        self.byteorder = sys.byteorder
    
    def __call__(self,x):
        """ Breaks x, which is a python integer type, into chuncks of 8 bits.
        Calls the lookup table to get the population count of each chunck and returns
        the aggregated population count. """

        return sum(x.to_bytes((x.bit_length()>>3)+1,self.byteorder).translate(self.lookup_table))

popcount = popcount_lk
print(popcount(56437865483765))

This should be 3 times faster than bin(56437865483765).count('1') in CPython and PyPy3.

answered Mar 24 '22 at 22:22

Ariad

11
1

1

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 25 '22 at 00:22
As far as the results are concerned, `bytes.maketrans(bytes(range(1<<8)),bytes((bin(i).count('1') for i in range(1<<8))))` and `bytes(bin(i).count('1') for i in range(1 << 8))` are the same. – Mechanic Pig Oct 09 '22 at 04:24
It is the call function that speeds things. – Ariad Oct 10 '22 at 10:21
@Ariad Yes, it is faster than `bin().count('1')`, so +1, I just pointed out the parts that can be modified. In addition, can you move the text description from the code comments to the body? People here seem to prefer the description in the body rather than in the code comments (my browser translation does not work with code comments either). – Mechanic Pig Oct 10 '22 at 10:57

score 0 · Answer 11 · answered Dec 17 '21 at 13:09

@Robotbugs' answer, but wrapped in numba's njit decorator was faster than the gmpy in my case.

@njit(int64(uint64))
def get_bit_count(bitboard):
    n = 0
    bitboard = int64(bitboard)
    while bitboard:
        n += 1
        bitboard &= bitboard - 1
    return n

I had to set uint64 as argument type to avoid OverlowError.

score -1 · Answer 12 · answered Jun 15 '19 at 05:14

#Python prg to count set bits
#Function to count set bits
def bin(n):
    count=0
    while(n>=1):
        if(n%2==0):
            n=n//2
        else:
            count+=1
            n=n//2
    print("Count of set bits:",count)
#Fetch the input from user
num=int(input("Enter number: "))
#Output
bin(num)

Marcin · Answer 13 · 2012-03-22T22:13:24.000

-3

It turns out your starting representation is a list of lists of ints which are either 1 or 0. Simply count them in that representation.

The number of bits in an integer is constant in python.

However, if you want to count the number of set bits, the fastest way is to create a list conforming to the following pseudocode: [numberofsetbits(n) for n in range(MAXINT)]

This will provide you a constant time lookup after you have generated the list. See @PaoloMoretti's answer for a good implementation of this. Of course, you don't have to keep this all in memory - you could use some sort of persistent key-value store, or even MySql. (Another option would be to implement your own simple disk-based storage).

edited Mar 22 '12 at 22:13

answered Mar 22 '12 at 20:04

Marcin

48,559
18
128
201

@StevenRumbalski How is it unhelpful? – Marcin Mar 22 '12 at 20:07
1

When I read your answer it contained only your first sentence: "The number of bits in an integer is constant in python." – Steven Rumbalski Mar 22 '12 at 20:34
I already have a bit count lookup table for all the the counts that it's possible to store, but having a large list of numbers and operating on them with a[i] & a[j] , makes your soltuion useless unless i have 10+GB of ram. array for & ^ | for tripples of 10000 numbers would be 3*10000^3 lookup table size. since i don't know what i will need, it makes more sense to just count the few thousand when i need them – zidarsk8 Mar 22 '12 at 20:49
@zidarsk8 Or, you could use some kind of database or persistent key-value store. – Marcin Mar 22 '12 at 20:53
@zidarsk8 10+GB of ram is not shockingly large. If you want to perform fast numerical computation, it's not unreasonable to use medium-large iron. – Marcin Mar 22 '12 at 21:01
@Marcin I am sorry but 10000^3 would make this a 1000 000 000 000 numbers . if each one is calculating 2000 bits that would make this a 500+ hour job. Look up tables are good when they can be used, in this case it's impossible and not the answer to my question. ps: for your method to work i would need an even faster way of counting these bits. – zidarsk8 Mar 22 '12 at 21:02
also this algorithm takes 3 minutes on my leptop with 2GB of ram (also that 10G was an optimistic guess), but let's not argue about this, since it clearly has no connection to the original question. Thank you anyway for trying. Every answer is a good one even if not usefull in this case. (maybe next time) – zidarsk8 Mar 22 '12 at 21:05
@zidarsk8 You only need to generate a look-up table once; in fact, you can generate it both incrementally, and piece-meal, so once you have a lookup table that is complete over a given number of bits, you can take your integer in chunks of that length of bits, and count the set bits in each chunk. You seem to be specifically resisting any practical solution proposed in any answer here. Why? – Marcin Mar 22 '12 at 21:09
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/9216/discussion-between-zidarsk8-and-marcin) – zidarsk8 Mar 22 '12 at 21:54
(I know this answer is old but) the OP was using `long` (as it's called in Python 2) and (seems to) not aware that it's the case. So the first sentence is wrong -- at least in this question. -- -- By the way, Python 2's `int` type maximum value is 2^63-1, so that would definitely not fit in any kind of table; and if you have to use a disk, it's already slower than the slowest algorithm you would find reasonable, since in this case the task is so simple. – user202729 Nov 05 '20 at 02:37

Fast way of counting non-zero bits in positive integer

13 Answers13

Edit

Linked

Related