1

I've read this question: Which is the fastest algorithm to find prime numbers?, but I'd like to do this only for 2 and 5 primes.

For example, the number 42000 is factorized as:

24 • 3153 • 71

I'm only interested in finding the powers of 2 and 5: 4 and 3 in this example.

My naive approach is to successively divide by 2 while the remainder is 0, then successively divide by 5 while the remainder is 0.

The number of successful divisions (with zero remainder) are the powers of 2 and 5.

This involves performing (x + y + 2) divisions, where x is the power of 2 and y is the power of 5.

Is there a faster algorithm to find the powers of 2 and 5?

Community
  • 1
  • 1
BenMorel
  • 34,448
  • 50
  • 182
  • 322
  • You could divide by 4 or 8 first, and if that fails, try 2 or 2 & 4. Though this would only be faster for "bigger" numbers. (idea based on the two egg problem - http://datagenetics.com/blog/july22012/index.html) – shapiro yaacov Jun 28 '15 at 13:06
  • @shapiro.yaacov Yes, but this is for a big number library, I would very well start with `1048576` as well, but depending on the size of the number, there is no way to know if this will be efficient. – BenMorel Jun 28 '15 at 13:08
  • Are you going to get any random number, or are these generate and have a higher prob. of a higher powers for `2` and ``? Also, what is more important: not wasting checks on number that don't divide, or finding the powers for number that do? – shapiro yaacov Jun 28 '15 at 13:10
  • How many bits do you use to represent numbers? Unless they are really huge, your current algorithm is simple and essentially optimal: O(log n) to finish. I *really* doubt that this function can be a performance bottleneck. – tucuxi Jun 28 '15 at 13:12
  • This is for a general-purpose big number library. The size of the number is totally unknown and could be huge (hundreds, thousands of digits, no limit). – BenMorel Jun 28 '15 at 13:13
  • @shapiro.yaacov There are two important things: finding the powers of 2 and 5, *and* knowing if there is at least one other factor (I don't need to know which one, just if there are any). – BenMorel Jun 28 '15 at 13:15
  • Then I think @tucuxi is right. Your idea is basically the best way to go. – shapiro yaacov Jun 28 '15 at 13:18
  • @tucuxi: When you say O(log n), you mean n is the number, not the number of digits, and you are assuming that divisions by 2 or 5 take constant time instead of growing with the number of digits. – Douglas Zare Jun 29 '15 at 05:57
  • @DouglasZare yes - this is true of typical ints, before OP clarified that the library was for arbitrary-length integers. With such arbitrary-length ints, this is indeed O(n^2); with n = length of number – tucuxi Jun 29 '15 at 06:09
  • 1
    What is the representation of your input and output? (OP please don't comment on this comment: augment the question.) – greybeard Jun 29 '15 at 08:30
  • @greybeard I'm not sure what you mean with *representation of the input and output*? They're arbitrary-size integers, stored as strings of digits in base 10. – BenMorel Jun 29 '15 at 09:16
  • 1
    `[input and output are] arbitrary-size integers, stored as strings of digits in base 10` - _exactly_ what _should_ have been stated from the beginnning _in the question itself_. (How is anybody besides you to know?) – greybeard Jun 29 '15 at 18:15
  • I didn't think are first glance that this would make much difference. Or rather, I did not really want to put the emphasize on this, as even though the numbers are *stored* as strings in base 10, all the calculations are not done on these strings. When GMP is available, they're converted to GMP numbers, and GMP does the lower-level calculations. So basically, I'm more interested in the theoretical calculations that could lead to finding the factors with less steps, regardless of the implementation details. – BenMorel Jun 29 '15 at 19:53
  • general algorithmic answer is, repeatedly delete by 2,4,8,...,2^n, 2^(n-1),2^(n-2),...,2. do the same with 5. – Will Ness Oct 01 '22 at 18:40

5 Answers5

1

Following the conversation, I do think your idea is the fastest way to go, with one exception:

Division (in most cases) is expensive. On the other hand, checking the last digit of the number is (usually?) faster, so I would check the last digit (0/5 and 0/2/4/6/8) before dividing.

shapiro yaacov
  • 2,308
  • 2
  • 26
  • 39
  • 1
    You can check for "ends in 2" easily in base-2. But no modern computer likes to store its digits in base-5 or base 10... – tucuxi Jun 28 '15 at 13:27
  • @tucuxi Even though what you're saying is true, it actually does make sense in my case, where my library is written in PHP and the number is actually stored as a string in base 10. That's not the most efficient indeed, but this is what worked best within the technical limits of the language. – BenMorel Jun 28 '15 at 13:34
1

I am basing this off this comment by the OP:

my library is written in PHP and the number is actually stored as a string in base 10. That's not the most efficient indeed, but this is what worked best within the technical limits of the language.

If you are committed to strings-in-php, then the following pseudo-code will speed things up compared to actual general-purpose repeated modulus and division:

while the string ends in 0, but is not 0
  chop a zero off the end,
  increment ctr2 and ctr5
switch repeatedly depending on the last digit:
  if it is a 5,
     divide it by 5
     increment ctr5
  if it is 2, 4, 6, 8,
     divide it by 2
     increment ctr2
  otherwise
     you have finished

This does not require any modulus operations, and you can implement divide-by-5 and divide-by-2 cheaper than a general-purpose long-number division.

On the other hand, if you want performance, using string representations for unlimited-size integers is suicide. Use gmp (which has a php library) for your math, and convert to strings only when necessary.

edit:

you can gain extra efficiency (and simplify your operations) using the following pseudocode:

if the string is zero, terminate early
while the last non-zero character of the string is a '5',
   add the string to itself
   decrement ctr2
count the '0's at the end of the string into a ctr0
chop off ctr0 zeros from the string
ctr2 += ctr0
ctr5 += ctr0
while the last digit is 2, 4, 6, 8
   divide the string by 2
   increment ctr2

Chopping many 0s at once is better than looping. And mul2 beats div5 in terms of speed (it can be implemented by adding the number once).

tucuxi
  • 17,561
  • 2
  • 43
  • 74
  • That's roughly what I'm doing at the moment, it's good to see that you came up with a similar solution! The library does use GMP or BCMath when available, but can also fall back to a native, digit-by-digit calculation for worst-case scenarios. You can find it on GitHub: [brick/math](https://github.com/brick/math) if you're interested! – BenMorel Jun 28 '15 at 13:54
  • 1
    Change internal representation of "native" to an array of base-10000 ints (if you like the advantages of base-10), or to 1e9 (if you like it, but have 64-bit ints). This should speed up your operations by a factor of 100 to 1e6, specially if you decide to implement things like doMul and doDiv using "the way its done at school" and avoid karatsuba and others. – tucuxi Jun 28 '15 at 17:31
  • That's an interesting idea to speed things up, although speeding up the native calculator is definitely not a priority so far: this implementation is really there as a fallback to ensure that the library will work seamlessly on any PHP installation, but anyone with requirements to compute large numbers is highly advised to install the GMP or BCMath PHP extension! – BenMorel Jun 28 '15 at 20:13
1

If you have a billion digit number, you do not want to do divisions on it unless it's really necessary. If you don't have reason to believe that it is in the 1/2^1000 numbers divisible by 2^1000, then it makes sense to use much faster tests that only look at the last few digits. You can tell whether a number is divisible by 2 by looking at the last digit, whether it is divisible by 4 by looking at the last 2 digits, and by 2^n by looking at the last n digits. Similarly, you can tell whether a number is divisible by 5 by looking at the last digit, whether it is divisible by 25 by looking at the last 2 digits, and by 5^n by looking at the last n digits.

I suggest that you first count and remove the trailing 0s, then decide from the last digit whether you are testing for powers of 2 (last digit 2,4,6, or 8) or powers of 5 (last digit 5).

If you are testing for powers of 2, then take the last 2, 4, 8, 16, ... 2^i digits, and multiply this by 25, 625, ... 5^2^i, counting the trailing 0s up to 2^i (but not beyond). If you get fewer than 2^i trailing 0s, then stop.

If you are testing for powers of 5, then take the last 2, 4, 8, 16, ... 2^i digits, and multiply this by 4, 16, ... 2^2^i, counting the trailing 0s up to 2^i (but not beyond). If you get fewer than 2^i trailing 0s, then stop.

For example, suppose the number you are analyzing is 283,795,456. Multiply 56 by 25, you get 1400 which has 2 trailing 0s, continue. Multiply 5,456 by 625, you get 3,410,000, which has 4 trailing 0s, continue. Multiply 83,795,456 by 5^8=390,625, you get 32,732,600,000,000, which has 8 trailing 0s, continue. Multiply 283,795,456 by 5^16 to get 43,303,750,000,000,000,000 which has only 13 trailing 0s. That's less than 16, so stop, the power of 2 in the prime factorization is 2^13.

I hope that for larger multiplications you are implementing an n log n algorithm for multiplying n digit numbers, but even if you aren't, this technique should outperform anything involving division on typical large numbers.


Let's look at the average-case time complexity of various algorithms, assuming that each n-digit number is equally likely.

Addition or subtraction of two n-digit numbers takes theta(n) steps.

Dividing an n-digit number by a small number like 5 takes theta(n) steps. Dividing by the base is O(1).

Dividing an n-digit number by another large number takes theta(n log n) steps using the FFT, or theta(n^2) by a naive algorithm. The same is true for multiplication.

The algorithm of repeatedly dividing a base 10 number by 2 has an average case time complexity of theta(n): It takes theta(n) time for the first division, and on average, you need to do only O(1) divisions.

Computing a large power of 2 with at least n digits takes theta(n log n) by repeated squaring, or theta(n^2) with simple multiplication. Performing Euclid's algorithm to compute the GCD takes an average of theta(n) steps. Although divisions take theta(n log n) time, most of the steps can be done as repeated subtractions and it takes only theta(n) time to do those. It takes O(n^2 log log n) to perform Euclid's algorithm this way. Other improvements might bring this down to theta(n^2).

Checking the last digit for divisibility by 2 or 5 before performing a more expensive calculation is good, but it only results in a constant factor improvement. Applying the original algorithm after this still takes theta(n) steps on average.

Checking the last d digits for divisibility by 2^d or 5^d takes O(d^2) time, O(d log d) with the FFT. It is very likely that we only need to do this when d is small. The fraction of n-digit numbers divisible by 2^d is 1/2^d. So, the average time spent on these checks is O(sum(d^2 / 2^d)) and that sum is bounded independent of n, so it takes theta(1) time on average. When you use the last digits to check for divisibility, you usually don't have to do any operations on close to n digits.

Douglas Zare
  • 3,296
  • 1
  • 14
  • 21
1

depends on whether you're starting with a native binary number or some bigint string -

  • chopping off very long chains of trailing edge zeros in bigint strings are a lot easier than trying to extract powers of 2 and 5 separately - e.g. 23456789 x 10^66

    23456789000000000000000000000000000000000000000000000000000000000000000000
    

This particular integer, on the surface, is 244-bits in total, requiring a 177-bit-wide mantissa (178-bit precision minus 1-bit implicit) to handle it losslessly, so even newer data types such as uint128 types won't suffice :

11010100011010101100101010010000110000101000100001000110100101
01011011111101001110100110111100001001010000110111110101101101
01001000011001110110010011010100001001101000010000110100000000
0000000000000000000000000000000000000000000000000000000000

The sequential approach is to spend 132 loop cycles in a bigint package to get them out ::

   129  63 2932098625
   130  64 586419725
   131  65 117283945
   132  66 23456789
   133  2^66 x
               5^66 x
                      23456789

But once you can quickly realize there's a chain of 66 trailing zeros, the bigint package becomes fully optional, since the residual digits is less than 24.5-bits in total width:

2^66
5^66
23456789
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11
  • yes simple bitshift/bitmask can handle the powers of 2 divisors very fast in both string and binary representation unless the internal representation does not use powers of 2 base , if not then usually power of 10 is used that still fits into binary word in such case you could use similar approach for the powers of 10 .... (handling most of the powers of 5 and 2 .... and then just do the normal division with the rest – Spektre Oct 02 '22 at 07:09
0

I think your algorithm will be the fastest. But I have a couple of suggestions.

One alternative is based on the greatest common divisor. Take the gcd of your input number with the smallest power of 2 greater than your input number; that will give you all the factors of 2. Divide by the gcd, then repeat the same operation with 5; that will give you all the factors of 5. Divide again by the gcd, and the remainder tells you if there are any other factors.

Another alternative is based on binary search. Split the binary representation of your input number in half; if the right half is 0, move left, otherwise move right. Once you have the factors of 2, divide, then apply the same algorithm on the remainder using powers of 5.

I'll leave it to you to implement and time these algorithms. But my gut feeling is that repeated division will be hard to beat.

I just read your comment that your input number is stored in base 10. In that case, divide repeatedly by 10 as long as the remainder is 0; that gives factors of both 2 and 5. Then apply your algorithm on the reduced number.

user448810
  • 17,381
  • 4
  • 34
  • 59
  • The GCD-based algorithm is interesting, but you would need a lot of multiplications to build the initial "next-biggest-power-of-two" and "next-biggetst-power-of-five". Calculating the GCD of a pair of large numbers is also O(n^2) substractions, with n the length of the smallest of the pair. – tucuxi Jun 29 '15 at 06:28
  • Very interesting approaches, thank you! Indeed, calculating the GCD is itself requiring several potentially costly operations, so it might not be faster (can't say for sure: as you say, it needs to be implemented and timed). About base-10, as numbers are actually stored as strings, I can just trim off trailing zeros one by one. Then only, divide by 2 while the last digit is even, then by 5 while the last digit is 5. – BenMorel Jun 29 '15 at 09:15
  • @tucuxi: Re: next biggest power of .... Computing the logarithm to base 2 can be done _very_ quickly, it's just the number of bits in the binary representation of the number, and then it's easy to compute the next biggest power of 2 as a 1-bit followed by 0-bits. Computing the logarithm to base 5 harder. Maybe a hybrid approach: gcd for powers of 2, and trial division for powers of 5. The gcd may be faster than you think, given one of the numbers is of simple form. The only way to find out is to implement and measure. – user448810 Jun 29 '15 at 13:06
  • @user448810 please provide an algorithm for computing the next power-of-2 to a given number within the constraints of "a number is a string in base 10". In base-2, it is indeed trivial; but I understand that OP is doing things the hard way. – tucuxi Jun 29 '15 at 13:33
  • @tucuxi: Initialize _x_ to 1. Repeatedly double _x_ until it is greater than the input number. Take the gcd of _x_ and the input to get the factors of 2. – user448810 Jun 29 '15 at 13:57
  • @user448810 that algorithm needs O(n^2), where n is the number of digits in the input number, to generate the next-largest power of 2 (because adding two n-digit strings is O(n), and you need to do that O(n) times). Finding the GCD is also O(n^2) (O(n) steps, with an n-digit string subtraction in each) . Therefore, with the OP's representation, this algorithm is not competitive with other O(n^2) algorithms. – tucuxi Jun 29 '15 at 17:29