Most efficient way to find prime factorization of 2 for a positive integer

Question

I am coding in C and wish to figure out the most efficient way to determine how many times 2 divides a number; i.e 5 = 0, 8 = 3. My question is, with this code, I utilized bitwise operations to speed up runtime, and overall the code is O(log N), is there anything computational or analytical I can do to optimize this code?

int Prime_Factor_Two(int n) {
    int k = 0;
    while(~(n&1) + 2){
        n = n >> 1;
        k +=1;
    }
    return k;
}

The number of times n goes into a number is log₂(n), which is O(1). — Robert Harvey, Oct 20 '20 at 16:09
For `int` there's only 30 numbers that are positive powers of two, the other 4.2 billion numbers aren't going to match. You could easily do this with a `switch` in *O(1)* time. — tadman, Oct 20 '20 at 16:10
`while(~(n&1) + 2)` (three operators and one test) can be simplified to `while((n&1) == 0)` (one operator and one test). — Weather Vane, Oct 20 '20 at 16:12
@EOF Obviously that can be adjusted as necessary for whatever ISA this is to target. — tadman, Oct 20 '20 at 16:13
@WeatherVane But that's not undefined behavior anymore, how disappointing! — EOF, Oct 20 '20 at 16:13
Do you have an actual, identified performance problem you're trying to solve, or is this just hypothetical? — Robert Harvey, Oct 20 '20 at 16:15
@RobertHarvey I'm assuming OP wants to know how many times you can divide by 2 *without a remainder*. So, `5` gives zero but `4` gives one. — Adrian Mole, Oct 20 '20 at 16:17
Some CPUs have built-in instructions for this if you can use inline assembly, e.g. [BSF](https://www.felixcloutier.com/x86/bsf). And your compiler might have intrinsics to call these too without assembly, e.g. [__builtin_ffs](https://stackoverflow.com/q/32792365/243245) for GCC or [_BitScanForward](https://learn.microsoft.com/en-us/cpp/intrinsics/bitscanforward-bitscanforward64?view=vs-2019) in MSVC — Rup, Oct 20 '20 at 16:19
You can do it in O(log log n) by first computing `(x & ~(x-1))` (this produces a power of 2) and then doing binary search on the result. — n. m. could be an AI, Oct 20 '20 at 16:32
@AdrianMole: You do have to check the result of the log calculation for a fractional part. A fractional part of zero indicates that 2 divides evenly. — Robert Harvey, Oct 20 '20 at 19:55
Walk the bytes from least to most, looking for non-zero. Increment count by 8. WIth remaining byte, use a look-up table. — chux - Reinstate Monica, Oct 20 '20 at 21:15
See [Counting consecutive trailing zero bits (or finding bit indices)](https://graphics.stanford.edu/~seander/bithacks.html). — chux - Reinstate Monica, Oct 21 '20 at 13:08

Marco Bonelli · Answer 1 · 2020-10-23T17:13:02.383

Okay, the most efficient way you say? How about (almost) a single assembly instruction?

From the GCC doc (and also available in Clang):

Built-in Function: int __builtin_ctz (unsigned int x)

Returns the number of trailing 0-bits in x, starting at the least significant bit position. If x is 0, the result is undefined.

unsigned Prime_Factor_Two(unsigned x) {
    return x ? __builtin_ctz(x) : 0;
}

No function calls, no loops, only one branch. If you know that the number is positive you can even remove that and just use __builtin_ctz(x).

The __builtin_ctz() built-in:

On x86 should compile to a single assembly instruction: TZCNT (if supported) or BSF.
On ARM should compile to two instructions: RBIT + CLZ.
On PowerPC should compile to 31 - CNTLZ(x & -x) (assuming 32bit unsigned).
On other platforms, maybe a handful of instructions.

To also support negative integers you can leverage the fact that the two's complement of a number preserves the least significant zeroes, and just change type from unsigned to int:

unsigned Prime_Factor_Two(int x) {
    return x ? __builtin_ctz(x) : 0;
}

Adrian Mole · Answer 2 · 2020-10-20T16:58:45.817

Assuming only positive numbers and that your system uses 2's Complement notation, you can first isolate the least significant set bit using the seemingly bizarre x = x & -x operation; then, you can convert that to the set bit's position using the log2(x) function.

Here's a test program:

#include <stdio.h>
#include <math.h>

int main()
{
    int num, ans;
    do {
        printf("Enter a number: ");
        if (scanf("%d", &num) != 1 || num == 0) break;
        ans = (int)(log2(num & -num) + 0.5);
        printf("Answer is: %d\n", ans);
    } while (num > 0);
    return 0;
}

Alternatively, to avoid using floating-point stuffs and the math(s) library, you can use a bit-shift loop (this will also work for negative and zero values):

int main()
{
    int num, ans;
    do {
        printf("Enter a number: ");
        if (scanf("%d", &num) != 1) break;
        num &= -num;
        for (ans = 0; num > 1; ans++) num >>= 1;
        printf("Answer is: %d\n", ans);
    } while (num > 0);
    return 0;
}

EDIT: Of course, both of the above methods are contrived and unnecessary; a simple loop with a shifting, single-bit mask will do the trick - except for a value of zero, which is, anyway, divisible by 2 (with no remainder) infinite times:

#include <stdio.h>

int main()
{
    int num, ans, bit;
    do {
        printf("Enter a number: ");
        if (scanf("%d", &num) != 1 || num == 0) break;
        for (ans = 0, bit = 1; !(num & bit); ans++) bit <<= 1;
        printf("Answer is: %d\n", ans);
    } while (1);
    return 0;
}

Utilizing your second code (bit-shift loop), do you know if that is more efficient than the code I had posted, or is it complier dependent? And yes, all inputs will be positive integers, and the complier utilizes 2's compliment, that is why I have ~(n&1) + 2 in the while loop, this generates 0 for odd numbers (no more division by 2) and 1 for even (keep going). Thanks for your help so far! — ChemeComp, Oct 20 '20 at 16:39
@ChemeComp See edit! I think the third solution is significantly *simpler* than yours, but I'm not sure how much faster (if at all) it would be. — Adrian Mole, Oct 20 '20 at 16:59

John Bollinger · Answer 3 · 2020-10-20T18:36:46.940

An interesting way to do this is with what amounts to a binary search for the least-significant 1 bit. You can even code it as explicitly branchless, though the example below does not quite do so. This approach does, however, require you to know the number of value bits in the argument type.

Example:

/*
 * Returns the number of factors of 2 in the prime factorization of the argument, or
 * returns -1 if the argument is 0.
 */
int factor_of_two_count(uint64_t in) {
    int result = -1;
    uint64_t bottom;
    
    bottom = (in & 0xffffffffu);
    in = bottom ? bottom : (in >> 32);
    result += !bottom * 32;

    bottom = (in & 0xffffu);
    in = bottom ? bottom : (in >> 16);
    result += !bottom * 16;

    bottom = (in & 0xffu);
    in = bottom ? bottom : (in >> 8);
    result += !bottom * 8;

    bottom = (in & 0xfu);
    in = bottom ? bottom : (in >> 4);
    result += !bottom * 4;

    bottom = (in & 0x3u);
    in = bottom ? bottom : (in >> 2);
    result += !bottom * 2;

    bottom = (in & 0x1u);
    result += !bottom;

    return result;
}

However, your bit-by bit loop will likely outperform that on random data, for that is roughly analogous to six passes through such a loop, and fewer than 2% of all random 64-bit inputs would require that many. Only if branch misprediction issues weighed heavily on the bitwise loop or if the distribution of inputs were skewed towards those with many factors of 2 would this be likely to be a winner.

score 0 · Answer 4 · edited Oct 21 '20 at 13:14

is there anything computational or analytical I can do to optimize this code?

See Count the consecutive zero bits (trailing) on the right with modulus division and lookup.

unsigned int v;  // find the number of trailing zeros in v
int r;           // put the result in r
static const int Mod37BitPosition[] = // map a bit value mod 37 to its position
{
  32, 0, 1, 26, 2, 23, 27, 0, 3, 16, 24, 30, 28, 11, 0, 13, 4,
  7, 17, 0, 25, 22, 31, 15, 29, 10, 12, 6, 0, 21, 14, 9, 5,
  20, 8, 19, 18
};
r = Mod37BitPosition[(-v & v) % 37];

Author explanation:

The code above finds the number of zeros that are trailing on the right, so binary 0100 would produce 2. It makes use of the fact that the first 32 bit position values are relatively prime with 37, so performing a modulus division with 37 gives a unique number from 0 to 36 for each. These numbers may then be mapped to the number of zeros using a small lookup table. It uses only 4 operations, however indexing into a table and performing modulus division may make it unsuitable for some situations.

Most efficient way to find prime factorization of 2 for a positive integer

4 Answers4