6

Notice

For a solution in Erlang or C / C++, go to Trial 4 below.


Wikipedia Articles

Integer square root

  • The definition of "integer square root" could be found here

Methods of computing square roots

  • An algorithm that does "bit magic" could be found here

[ Trial 1 : Using Library Function ]

Code

isqrt(N) when erlang:is_integer(N), N >= 0 ->
    erlang:trunc(math:sqrt(N)).

Problem

This implementation uses the sqrt() function from the C library, so it does not work with arbitrarily large integers (Note that the returned result does not match the input. The correct answer should be 12345678901234567890):

Erlang R16B03 (erts-5.10.4) [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V5.10.4  (abort with ^G)
1> erlang:trunc(math:sqrt(12345678901234567890 * 12345678901234567890)).
12345678901234567168
2> 

[ Trial 2 : Using Bigint + Only ]

Code

isqrt2(N) when erlang:is_integer(N), N >= 0 ->
    isqrt2(N, 0, 3, 0).

isqrt2(N, I, _, Result) when I >= N ->
    Result;

isqrt2(N, I, Times, Result) ->
    isqrt2(N, I + Times, Times + 2, Result + 1).

Description

This implementation is based on the following observation:

isqrt(0) = 0   # <--- One 0
isqrt(1) = 1   # <-+
isqrt(2) = 1   #   |- Three 1's
isqrt(3) = 1   # <-+
isqrt(4) = 2   # <-+
isqrt(5) = 2   #   |
isqrt(6) = 2   #   |- Five 2's
isqrt(7) = 2   #   |
isqrt(8) = 2   # <-+
isqrt(9) = 3   # <-+
isqrt(10) = 3  #   |
isqrt(11) = 3  #   |
isqrt(12) = 3  #   |- Seven 3's
isqrt(13) = 3  #   |
isqrt(14) = 3  #   |
isqrt(15) = 3  # <-+
isqrt(16) = 4  # <--- Nine 4's
...

Problem

This implementation involves only bigint additions so I expected it to run fast. However, when I fed it with 1111111111111111111111111111111111111111 * 1111111111111111111111111111111111111111, it seems to run forever on my (very fast) machine.


[ Trial 3 : Using Binary Search with Bigint +1, -1 and div 2 Only ]

Code

Variant 1 (My original implementation)

isqrt3(N) when erlang:is_integer(N), N >= 0 ->
    isqrt3(N, 1, N).

isqrt3(_N, Low, High) when High =:= Low + 1 ->
    Low;

isqrt3(N, Low, High) ->
    Mid = (Low + High) div 2,
    MidSqr = Mid * Mid,
    if
        %% This also catches N = 0 or 1
        MidSqr =:= N ->
            Mid;
        MidSqr < N ->
            isqrt3(N, Mid, High);
        MidSqr > N ->
            isqrt3(N, Low, Mid)
    end.

Variant 2 (modified above code so that the boundaries go with Mid+1 or Mid-1 instead, with reference to the answer by Vikram Bhat)

isqrt3a(N) when erlang:is_integer(N), N >= 0 ->
    isqrt3a(N, 1, N).

isqrt3a(N, Low, High) when Low >= High ->
    HighSqr = High * High,
    if
        HighSqr > N ->
            High - 1;
        HighSqr =< N ->
            High
    end;

isqrt3a(N, Low, High) ->
    Mid = (Low + High) div 2,
    MidSqr = Mid * Mid,
    if
        %% This also catches N = 0 or 1
        MidSqr =:= N ->
            Mid;
        MidSqr < N ->
            isqrt3a(N, Mid + 1, High);
        MidSqr > N ->
            isqrt3a(N, Low, Mid - 1)
    end.

Problem

Now it solves the 79-digit number (namely 1111111111111111111111111111111111111111 * 1111111111111111111111111111111111111111) in lightening speed, the result is shown immediately. However, it takes 60 seconds (+- 2 seconds) on my machine to solve one million (1,000,000) 61-digit numbers (namely, from 1000000000000000000000000000000000000000000000000000000000000 to 1000000000000000000000000000000000000000000000000000001000000). I would like to do it even faster.


[ Trial 4 : Using Newton's Method with Bigint + and div Only ]

Code

isqrt4(0) -> 0;

isqrt4(N) when erlang:is_integer(N), N >= 0 ->
    isqrt4(N, N).

isqrt4(N, Xk) ->
    Xk1 = (Xk + N div Xk) div 2,
    if
        Xk1 >= Xk ->
            Xk;
        Xk1 < Xk ->
            isqrt4(N, Xk1)
    end.

Code in C / C++ (for your interest)

Recursive variant

#include <stdint.h>

uint32_t isqrt_impl(
    uint64_t const n,
    uint64_t const xk)
{
    uint64_t const xk1 = (xk + n / xk) / 2;
    return (xk1 >= xk) ? xk : isqrt_impl(n, xk1);
}

uint32_t isqrt(uint64_t const n)
{
    if (n == 0) return 0;
    if (n == 18446744073709551615ULL) return 4294967295U;
    return isqrt_impl(n, n);
}

Iterative variant

#include <stdint.h>

uint32_t isqrt_iterative(uint64_t const n)
{
    uint64_t xk = n;
    if (n == 0) return 0;
    if (n == 18446744073709551615ULL) return 4294967295U;
    do
    {
        uint64_t const xk1 = (xk + n / xk) / 2;
        if (xk1 >= xk)
        {
            return xk;
        }
        else
        {
            xk = xk1;
        }
    } while (1);
}

Problem

The Erlang code solves one million (1,000,000) 61-digit numbers in 40 seconds (+- 1 second) on my machine, so this is faster than Trial 3. Can it go even faster?


About My Machine

Processor : 3.4 GHz Intel Core i7

Memory : 32 GB 1600 MHz DDR3

OS : Mac OS X Version 10.9.1


Related Questions

Integer square root in python

  • The answer by user448810 uses "Newton's Method". I'm not sure whether doing the division using "integer division" is okay or not. I'll try this later as an update. [UPDATE (2015-01-11): It is okay to do so]

  • The answer by math involves using a 3rd party Python package gmpy, which is not very favourable to me, since I'm primarily interested in solving it in Erlang with only builtin facilities.

  • The answer by DSM seems interesting. I don't really understand what is going on, but it seems that "bit magic" is involved there, and so it's not quite suitable for me too.

Infinite Recursion in Meta Integer Square Root

  • This question is for C++, and the algorithm by AraK (the questioner) looks like it's from the same idea as Trial 2 above.
Community
  • 1
  • 1
  • `//` *is* integer division in Python, so Newton's method on `x^2 - n = 0` works. – Blender Feb 09 '14 at 09:44
  • @Blender : I know what `//` means in Python, and therefore I doubt whether it works, since the algorithm published in Wikipedia (and anything goes with **"Newton's method"**) works on real number (notice the footnote on floating point). – Siu Ching Pong -Asuka Kenji- Feb 09 '14 at 11:15
  • Using integer division doesn't make any difference. The sequence {x_n} is decreasing with a greatest lower bound of `isqrt(n)`. – Blender Feb 09 '14 at 12:06
  • @Blender : Applying the algorithm from Wikipedia directly and substituting division by integer division does not lead to correct results. Since Wikipedia's terminating condition is `abs(x[k+1] - x[k]) < 1` and since we're doing integer arithmetic, `x[k+1]` must be equal to `x[k]` in order to fulfil the condition. Therefore, when the algorithm applies to the input **3**: `x0 = 3`, `x1 = (3 + 3 div 3) div 2 = 2`, `x2 = (2 + 3 div 2) div 2 = 1`, `x3 = (1 + 3 div 1) div 2 = 2`, which causes an infinite loop. To correct this, terminate when `x[k+1] >= x[k]` (because of the reason you mentioned). – Siu Ching Pong -Asuka Kenji- Feb 10 '14 at 02:53
  • Ah, you're right. I was looking at the code in that first answer, which uses `x_{n_1} >= x_n` as the stopping condition. I think the Wikipedia example could be updated to account for both integer and floating point division. – Blender Feb 10 '14 at 03:31
  • Very nice solution! I added some isqrt() benchmarks: http://pastebin.com/vgg5JaKu – Michaelangel007 Aug 08 '14 at 19:34
  • @Michaelangelo : Thanks for including my algorithm! As far as I can see, your algorithms target 64-bit unsigned integer, while my goal is on arbitrarily large integers (aka. big integers / bigints). I think even faster implementations exist for machine integers / fixed size integers - either by assembly or bit magic. Here is a good book on this: [Hacker's Delight (2nd Edition)](http://www.amazon.com/Hackers-Delight-Edition-Henry-Warren/dp/0321842685). [Here](http://www.hackersdelight.org/) is the official website of the book. – Siu Ching Pong -Asuka Kenji- Aug 09 '14 at 08:15
  • @AsukaKenji-SiuChingPong- np! Yes, I'm ultimately looking for fast algorithms for my own bigint / arbitrary precision library. I <3 Hacker's Delight! Great bit twiddling tricks – Michaelangel007 Aug 11 '14 at 23:06

1 Answers1

2

How about binary search like following doesn't need floating divisions only integer multiplications (Slower than newtons method) :-

low = 1;

/* More efficient bound

high = pow(10,log10(target)/2+1);

*/


high = target


while(low<high) {

 mid = (low+high)/2;
 currsq = mid*mid;

 if(currsq==target) {
    return(mid);
 }

 if(currsq<target) {

      if((mid+1)*(mid+1)>target) {
             return(mid);
      }    
      low =  mid+1;
  }

 else {

     high = mid-1;
 }

}

This works for O(logN) iterations so should not run forever for even very large numbers

Log10(target) Computation if needed :-

acc = target

log10 = 0;

while(acc>0) {

  log10 = log10 + 1;
  acc = acc/10;
}

Note : acc/10 is integer division

Edit :-

Efficient bound :- The sqrt(n) has about half the number of digits as n so you can pass high = 10^(log10(N)/2+1) && low = 10^(log10(N)/2-1) to get tighter bound and it should provide 2 times speed up.

Evaluate bound:-

bound = 1;
acc = N;
count = 0;
while(acc>0) {

 acc = acc/10;

 if(count%2==0) {

    bound = bound*10;
 }

 count++;

}

high = bound*10;
low = bound/10;
isqrt(N,low,high);
BenMorel
  • 34,448
  • 50
  • 182
  • 322
Vikram Bhat
  • 6,106
  • 3
  • 20
  • 19
  • I implemented this before. I will check whether this or the Newton's method is better. The problem is that the multiplication is on big integer, not on a machine integer, so it is an expensive operation. – Siu Ching Pong -Asuka Kenji- Feb 09 '14 at 12:50
  • @AsukaKenji-SiuChingPong- integer multiplication takes about O(logN^2) hence this algorithm would O(logN^3) which still reasonably fast for your application. – Vikram Bhat Feb 09 '14 at 12:58
  • Could you please explain the initial value of `high`? In your original answer, it was `target`, and you edited it 2 times to become `pow(10,log(target)/2+1)`. Why is it? Besides, since `target` is a big integer (bigger than `long long double` could hold), I'm afraid you need something like `ipow()` and `ilog()` to do the job (and that deserves another question), since the C library cannot deal with it. – Siu Ching Pong -Asuka Kenji- Feb 09 '14 at 16:15
  • @AsukaKenji-SiuChingPong- You can surely use high = target but in this case you are needlessly calculating values which cannot ever be in sqrt range of target as sqrt(target) will be have half number of digits as the target so its more efficient to start with high = pow(10,log10(N)/2+1) which gives 2 times speed up over high = target – Vikram Bhat Feb 09 '14 at 16:23
  • Okay, please let me make my question clear... I mean, how did you reach the initial value `high = pow(10,log10(N)/2+1)`? That means: when N = `10`, high = 10^(1.5) = `31`; when N = `100`, high = 10^(2) = `100`; when N = `1000`, high = 10^(2.5) = `316`; when N = `10000`, high = 10^(3) = `1000`; etc. Why? – Siu Ching Pong -Asuka Kenji- Feb 09 '14 at 16:37
  • @AsukaKenji-SiuChingPong- all divisions are integer divisions hence for example target = 10^(2N) then high = 10^(N+1) and we know that sqrt(target) = 10^N hence by having this bound is tighter hence many multiplications are reduced. – Vikram Bhat Feb 09 '14 at 16:46
  • @AsukaKenji-SiuChingPong- check my edit to improve the speed of algorithm – Vikram Bhat Feb 10 '14 at 05:20
  • If you make the `acc = acc / 10;` into `acc /= 100;`, then `count` is not needed (and so do the `if (count % 2 == 0)`). Am I correct? – Siu Ching Pong -Asuka Kenji- Feb 10 '14 at 06:43