41

Possible Duplicate:
Best algorithm to count the number of set bits in a 32-bit integer?

How do I count the number of 1's a number will have in binary?

So let's say I have the number 45, which is equal to 101101 in binary and has 4 1's in it. What's the most efficient way to write an algorithm to do this?

Community
  • 1
  • 1
david
  • 1,107
  • 6
  • 12
  • 21

8 Answers8

63

Instead of writing an algorithm to do this its best to use the built in function. Integer.bitCount()

What makes this especially efficient is that the JVM can treat this as an intrinsic. i.e. recognise and replace the whole thing with a single machine code instruction on a platform which supports it e.g. Intel/AMD


To demonstrate how effective this optimisation is

public static void main(String... args) {
    perfTestIntrinsic();

    perfTestACopy();
}

private static void perfTestIntrinsic() {
    long start = System.nanoTime();
    long countBits = 0;
    for (int i = 0; i < Integer.MAX_VALUE; i++)
        countBits += Integer.bitCount(i);
    long time = System.nanoTime() - start;
    System.out.printf("Intrinsic: Each bit count took %.1f ns, countBits=%d%n", (double) time / Integer.MAX_VALUE, countBits);
}

private static void perfTestACopy() {
    long start2 = System.nanoTime();
    long countBits2 = 0;
    for (int i = 0; i < Integer.MAX_VALUE; i++)
        countBits2 += myBitCount(i);
    long time2 = System.nanoTime() - start2;
    System.out.printf("Copy of same code: Each bit count took %.1f ns, countBits=%d%n", (double) time2 / Integer.MAX_VALUE, countBits2);
}

// Copied from Integer.bitCount()
public static int myBitCount(int i) {
    // HD, Figure 5-2
    i = i - ((i >>> 1) & 0x55555555);
    i = (i & 0x33333333) + ((i >>> 2) & 0x33333333);
    i = (i + (i >>> 4)) & 0x0f0f0f0f;
    i = i + (i >>> 8);
    i = i + (i >>> 16);
    return i & 0x3f;
}

prints

Intrinsic: Each bit count took 0.4 ns, countBits=33285996513
Copy of same code: Each bit count took 2.4 ns, countBits=33285996513

Each bit count using the intrinsic version and loop takes just 0.4 nano-second on average. Using a copy of the same code takes 6x longer (gets the same result)

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • @PeterLawrey: can you please describe your test environment? On my machine (Xeon W3520@2.67GHz, 32-bit Java 7 running on Win7 x64) I got: "Intrinsic: Each bit count took 8.1 ns, Copy of same code: Each bit count took 8.1 ns", and when I manually inline `myBitCount()` I got 8.1ns vs. 5.4ns, respectively. – Igor Korkhov Mar 08 '12 at 17:24
  • 1
    @PeterLawrey: I'm not particularly interested in absolute numbers, I'd really like to know whether `Integer.bitCount(i)` makes use of `POPCNT` or similar processor instruction or not. Looking at my results I started to doubt it does. – Igor Korkhov Mar 08 '12 at 17:27
  • If you have an old version of Java, I wouldn't be surprised if it isn't as optimal. Which version of Java are you using? – Peter Lawrey Mar 08 '12 at 22:04
  • @PeterLawrey: I used Java SE 7u3 to run your benchmark. – Igor Korkhov Mar 08 '12 at 22:25
  • Interesting, so did I. I have an Intel i7 with a 64-bit JVM. I am amazed it was 20x slower. :P Can you check you are using the `-server` JVM? – Peter Lawrey Mar 08 '12 at 22:29
  • @PeterLawrey: tomorrow morning I will run it on 64-bit server JVM (I used 32-bit) and tell you the results. 20x speed difference is indeed amazing. – Igor Korkhov Mar 08 '12 at 22:38
  • On `-server` x64 Java 7u3 I've got 0.9ns vs 4.2ns (`Integer.bitCout()` vs bit shifting shenanigans), i.e. the same proportion you had on your machine. – Igor Korkhov Mar 09 '12 at 10:43
  • The server JVM can optimise the code more aggressively, but has a longer start up time. The 32-bit windows JVM is `-client` by default, but just about every other OS its `-server`. I believe the 64-bit JVM on Windows is also `-server` by default. – Peter Lawrey Mar 09 '12 at 11:55
36

The most efficient way to count the number of 1's in a 32-bit variable v I know of is:

v = v - ((v >> 1) & 0x55555555);
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // c is the result

Updated: I want to make clear that it's not my code, actually it's older than me. According to Donald Knuth (The Art of Computer Programming Vol IV, p 11), the code first appeared in the first textbook on programming, The Preparation of Programs for an Electronic Digital Computer by Wilkes, Wheeler and Gill (2nd Ed 1957, reprinted 1984). Pages 191–193 of the 2nd edition of the book presented Nifty Parallel Count by D B Gillies and J C P Miller.

Greg A. Woods
  • 2,663
  • 29
  • 26
Igor Korkhov
  • 8,283
  • 1
  • 26
  • 31
15

See Bit Twidling Hacks and study all the 'counting bits set' algorithms. In particular, Brian Kernighan's way is simple and quite fast if you expect a small answer. If you expect an evenly distributed answer, lookup table might be better.

5

This is called Hamming weight. It is also called the population count, popcount or sideways sum.

Kuldeep Jain
  • 8,409
  • 8
  • 48
  • 73
2

The following is either from "Bit Twiddling Hacks" page or Knuth's books (I don't remember). It is adapted to unsigned 64 bit integers and works on C#. I don't know if the lack of unsigned values in Java creates a problem.

By the way, I write the code only for reference; the best answer is using Integer.bitCount() as @Lawrey said; since there is a specific machine code operation for this operation in some (but not all) CPUs.

  const UInt64 m1 = 0x5555555555555555;
  const UInt64 m2 = 0x3333333333333333;
  const UInt64 m4 = 0x0f0f0f0f0f0f0f0f;
  const UInt64 h01 = 0x0101010101010101;

  public int Count(UInt64 x)
  {
      x -= (x >> 1) & m1;
      x = (x & m2) + ((x >> 2) & m2);
      x = (x + (x >> 4)) & m4;
      return (int) ((x * h01) >> 56);
  }
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Ali Ferhat
  • 2,511
  • 17
  • 24
0
public int f(int n) 
{
    int result = 0;
    for(;n > 0; n = n >> 1)
        result += ((n & 1) == 1 ? 1 : 0);

    return result;
}
mcfinnigan
  • 11,442
  • 35
  • 28
0

The fastest I have used and also seen in a practical implementation (in the open source Sphinx Search Engine) is the MIT HAKMEM algorithm. It runs superfast over a very large stream of 1's and 0's.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Yavar
  • 11,883
  • 5
  • 32
  • 63
0

The following Ruby code works for positive numbers.

count = 0
while num > 1
    count = (num % 2 == 1) ? count + 1 : count
    num = num >> 1
end
count += 1
return count
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
umar
  • 4,309
  • 9
  • 34
  • 47