12

Lets N be a number (10<=N<=10^5).

I have to break it into 3 numbers (x,y,z) such that it validates the following conditions.

1. x<=y<=z 
2. x^2+y^2=z^2-1;
3. x+y+z<=N

I have to find how many combinations I can get from the given numbers in a method.

I have tried as follows but it's taking so much time for a higher number and resulting in a timeout..

int N= Int32.Parse(Console.ReadLine());
List<String> res = new List<string>();

//x<=y<=z
int mxSqrt = N - 2;
int a = 0, b = 0;
for (int z = 1; z <= mxSqrt; z++)
{
    a = z * z;
    for (int y = 1; y <= z; y++)
    {
        b = y * y;
        for (int x = 1; x <= y; x++)
        {
            int x1 = b + x * x;
            int y1 = a - 1;
            if (x1 == y1 && ((x + y + z) <= N))
            {
                res.Add(x + "," + y + "," + z);
            }
        }
    }
}
Console.WriteLine(res.Count());

My question:

My solution is taking time for a bigger number (I think it's the for loops), how can I improve it?

Is there any better approach for the same?

גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
A_Sk
  • 4,532
  • 3
  • 27
  • 51

6 Answers6

6

Here's a method that enumerates the triples, rather than exhaustively testing for them, using number theory as described here: https://mathoverflow.net/questions/29644/enumerating-ways-to-decompose-an-integer-into-the-sum-of-two-squares

Since the math took me a while to comprehend and a while to implement (gathering some code that's credited above it), and since I don't feel much of an authority on the subject, I'll leave it for the reader to research. This is based on expressing numbers as Gaussian integer conjugates. (a + bi)*(a - bi) = a^2 + b^2. We first factor the number, z^2 - 1, into primes, decompose the primes into Gaussian conjugates and find different expressions that we expand and simplify to get a + bi, which can be then raised, a^2 + b^2.

A perk of reading about the Sum of Squares Function is discovering that we can rule out any candidate z^2 - 1 that contains a prime of form 4k + 3 with an odd power. Using that check alone, I was able to reduce Prune's loop on 10^5 from 214 seconds to 19 seconds (on repl.it) using the Rosetta prime factoring code below.

The implementation here is just a demonstration. It does not have handling or optimisation for limiting x and y. Rather, it just enumerates as it goes. Play with it here.

Python code:

# https://math.stackexchange.com/questions/5877/efficiently-finding-two-squares-which-sum-to-a-prime
def mods(a, n):
    if n <= 0:
        return "negative modulus"
    a = a % n
    if (2 * a > n):
        a -= n
    return a

def powmods(a, r, n):
    out = 1
    while r > 0:
        if (r % 2) == 1:
            r -= 1
            out = mods(out * a, n)
        r /= 2
        a = mods(a * a, n)
    return out

def quos(a, n):
    if n <= 0:
        return "negative modulus"
    return (a - mods(a, n))/n

def grem(w, z):
    # remainder in Gaussian integers when dividing w by z
    (w0, w1) = w
    (z0, z1) = z
    n = z0 * z0 + z1 * z1
    if n == 0:
        return "division by zero"
    u0 = quos(w0 * z0 + w1 * z1, n)
    u1 = quos(w1 * z0 - w0 * z1, n)
    return(w0 - z0 * u0 + z1 * u1,
           w1 - z0 * u1 - z1 * u0)

def ggcd(w, z):
    while z != (0,0):
        w, z = z, grem(w, z)
    return w

def root4(p):
    # 4th root of 1 modulo p
    if p <= 1:
        return "too small"
    if (p % 4) != 1:
        return "not congruent to 1"
    k = p/4
    j = 2
    while True:
        a = powmods(j, k, p)
        b = mods(a * a, p)
        if b == -1:
            return a
        if b != 1:
            return "not prime"
        j += 1

def sq2(p):
    if p % 4 != 1:
      return "not congruent to 1 modulo 4"
    a = root4(p)
    return ggcd((p,0),(a,1))

# https://rosettacode.org/wiki/Prime_decomposition#Python:_Using_floating_point
from math import floor, sqrt

def fac(n):
    step = lambda x: 1 + (x<<2) - ((x>>1)<<1)
    maxq = long(floor(sqrt(n)))
    d = 1
    q = n % 2 == 0 and 2 or 3 
    while q <= maxq and n % q != 0:
        q = step(d)
        d += 1
    return q <= maxq and [q] + fac(n//q) or [n]

# My code...
# An answer for  https://stackoverflow.com/questions/54110614/

from collections import Counter
from itertools import product
from sympy import I, expand, Add

def valid(ps):
  for (p, e) in ps.items():
    if (p % 4 == 3) and (e & 1):
      return False
  return True

def get_sq2(p, e):
  if p == 2:
    if e & 1:
      return [2**(e / 2), 2**(e / 2)]
    else:
      return [2**(e / 2), 0]
  elif p % 4 == 3:
    return [p, 0]
  else:
    a,b = sq2(p)
    return [abs(a), abs(b)]

def get_terms(cs, e):
  if e == 1:
    return [Add(cs[0], cs[1] * I)]
  res = [Add(cs[0], cs[1] * I)**e]
  for t in xrange(1, e / 2 + 1):
    res.append(
      Add(cs[0] + cs[1]*I)**(e-t) * Add(cs[0] - cs[1]*I)**t)
  return res

def get_lists(ps):
  items = ps.items()
  lists = []
  for (p, e) in items:
    if p == 2:
      a,b = get_sq2(2, e)
      lists.append([Add(a, b*I)])
    elif p % 4 == 3:
      a,b = get_sq2(p, e)
      lists.append([Add(a, b*I)**(e / 2)])
    else:
      lists.append(get_terms(get_sq2(p, e), e))
  return lists


def f(n):
  for z in xrange(2, n / 2):
    zz = (z + 1) * (z - 1)
    ps = Counter(fac(zz))
    is_valid = valid(ps)
    if is_valid:
      print "valid (does not contain a prime of form\n4k + 3 with an odd power)"
      print "z: %s, primes: %s" % (z, dict(ps))
      lists = get_lists(ps)
      cartesian = product(*lists)
      for element in cartesian:
        print "prime square decomposition: %s" % list(element)
        p = 1
        for item in element:
          p *= item
        print "complex conjugates: %s" % p
        vals = p.expand(complex=True, evaluate=True).as_coefficients_dict().values()
        x, y = vals[0], vals[1] if len(vals) > 1 else 0
        print "x, y, z: %s, %s, %s" % (x, y, z)
        print "x^2 + y^2, z^2-1: %s, %s" % (x**2 + y**2, z**2 - 1)
      print ''

if __name__ == "__main__":
  print f(100)

Output:

valid (does not contain a prime of form
4k + 3 with an odd power)
z: 3, primes: {2: 3}
prime square decomposition: [2 + 2*I]
complex conjugates: 2 + 2*I
x, y, z: 2, 2, 3
x^2 + y^2, z^2-1: 8, 8

valid (does not contain a prime of form
4k + 3 with an odd power)
z: 9, primes: {2: 4, 5: 1}
prime square decomposition: [4, 2 + I]
complex conjugates: 8 + 4*I
x, y, z: 8, 4, 9
x^2 + y^2, z^2-1: 80, 80

valid (does not contain a prime of form
4k + 3 with an odd power)
z: 17, primes: {2: 5, 3: 2}
prime square decomposition: [4 + 4*I, 3]
complex conjugates: 12 + 12*I
x, y, z: 12, 12, 17
x^2 + y^2, z^2-1: 288, 288

valid (does not contain a prime of form
4k + 3 with an odd power)
z: 19, primes: {2: 3, 3: 2, 5: 1}
prime square decomposition: [2 + 2*I, 3, 2 + I]
complex conjugates: (2 + I)*(6 + 6*I)
x, y, z: 6, 18, 19
x^2 + y^2, z^2-1: 360, 360

valid (does not contain a prime of form
4k + 3 with an odd power)
z: 33, primes: {17: 1, 2: 6}
prime square decomposition: [4 + I, 8]
complex conjugates: 32 + 8*I
x, y, z: 32, 8, 33
x^2 + y^2, z^2-1: 1088, 1088

valid (does not contain a prime of form
4k + 3 with an odd power)
z: 35, primes: {17: 1, 2: 3, 3: 2}
prime square decomposition: [4 + I, 2 + 2*I, 3]
complex conjugates: 3*(2 + 2*I)*(4 + I)
x, y, z: 18, 30, 35
x^2 + y^2, z^2-1: 1224, 1224
גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
  • Although the number of combinations found for `N = 10` and `N = 100` match the other solutions posted here, when it comes to `N = 1000` and higher, the tally falls far short. Is this a shortcoming of all the other solutions, or this one? – cdlane Feb 28 '19 at 17:05
  • For `f(1000)`, the above produces the result `x, y, z: 112, 476, 489` which totals 1077. Similarly, `x, y, z: 242, 418, 483` which totals 1143. Am I misunderstanding how to run this? – cdlane Feb 28 '19 at 17:14
  • @cdlane thank you for your comments. `x, y, z` are the actual triples. `(112**2 + 476**2 = 239120; 489**2 = 239121)`; `(242**2 + 418**2 = 233288; 483**2 = 233289)`. – גלעד ברקן Feb 28 '19 at 17:36
  • I understood that, but rule 3 above is `x + y + z <= N` where `N` was 1000 in this case. – cdlane Feb 28 '19 at 17:38
  • @cdlane as stated in my answer, the "implementation here is just a demonstration. It does not have handling or optimisation for limiting x and y. Rather, it just enumerates as it goes." – גלעד ברקן Feb 28 '19 at 17:38
  • Then it should produce *more* results than the other answers, not *fewer* as it currently does? – cdlane Feb 28 '19 at 17:39
  • @cdlane in that case, it may be missing some aspect of the enumeration. My guess would be to start with examining the code and theory that expresses the prime as two squares. It only returns one tuple. – גלעד ברקן Feb 28 '19 at 17:41
  • @cdlane If there were more ways to express the prime, the other code should capture all the conjugates, although I wouldn't rule out something missing there as well. – גלעד ברקן Feb 28 '19 at 17:42
  • @cdlane I didn't mean to fully answer the OPs requirement. My answer is more an opportunity to build on the ideas behind it. According to the math.stackexchange answer, following that theory should enumerate all possible triples. – גלעד ברקן Feb 28 '19 at 17:43
  • Understood, and I appreciate it for what it's trying to do as well as its potential efficiency if completed. I saw that it received the OP's posted bounty, and I was trying to understand why when it didn't meet either criteria of the bounty. – cdlane Feb 28 '19 at 17:49
  • @cdlane I think the answer got half the bounty automatically for upvotes. I don't believe the OP awarded it, although I may be wrong. – גלעד ברקן Feb 28 '19 at 17:50
3

Here is a simple improvement in Python (converting to the faster equivalent in C-based code is left as an exercise for the reader). To get accurate timing for the computation, I removed printing the solutions themselves (after validating them in a previous run).

  • Use an outer loop for one free variable (I chose z), constrained only by its relation to N.
  • Use an inner loop (I chose y) constrained by the outer loop index.
  • The third variable is directly computed per requirement 2.

Timing results:

-------------------- 10 
 1 solutions found in 2.3365020751953125e-05  sec.
-------------------- 100 
 6 solutions found in 0.00040078163146972656  sec.
-------------------- 1000 
 55 solutions found in 0.030081748962402344  sec.
-------------------- 10000 
 543 solutions found in 2.2078349590301514  sec.
-------------------- 100000 
 5512 solutions found in 214.93411707878113  sec.

That's 3:35 for the large case, plus your time to collate and/or print the results.

If you need faster code (this is still pretty brute-force), look into Diophantine equations and parameterizations to generate (y, x) pairs, given the target value of z^2 - 1.

import math
import time

def break3(N):
    """
    10 <= N <= 10^5
    return x, y, z triples such that:
        x <= y <= z
        x^2 + y^2 = z^2 - 1        
        x + y + z <= N
    """

    """
    Observations:
    z <= x + y
    z < N/2
    """

    count = 0
    z_limit = N // 2
    for z in range(3, z_limit):

        # Since y >= x, there's a lower bound on y
        target = z*z - 1
        ymin = int(math.sqrt(target/2))
        for y in range(ymin, z):
            # Given y and z, compute x.
            # That's a solution iff x is integer.
            x_target = target - y*y
            x = int(math.sqrt(x_target))
            if x*x == x_target and x+y+z <= N:
                # print("solution", x, y, z)
                count += 1

    return count


test = [10, 100, 1000, 10**4, 10**5]
border = "-"*20

for case in test: 
    print(border, case)
    start = time.time()
    print(break3(case), "solutions found in", time.time() - start, "sec.")
Prune
  • 76,765
  • 14
  • 60
  • 81
  • Yes, I see the invalid outputs! Of the 63 solutions identified, only 55 are within the sum. The trivial way is a single-line check, now in the answer. When I get half an hour, I'll kill the root problem (upper limit of the inner loop) – Prune Feb 20 '19 at 19:29
  • I'm going to leave the solution like this. Yes, we can restrict the upper limit appropriately, but the program is more readable and maintainable in its current form. – Prune Feb 20 '19 at 19:41
  • Not sure if I'm missing something but aren't there 2 solutions for `N = 10` ? (x, y, z) = (0, 0, 1) or (2, 2, 3). Unless 1 is assumed to be outside of solutions range (OP did start loops at 1 so maybe...) – Mat Feb 20 '19 at 20:42
  • @Mat Excluding the degenerate solutions was part of the original problem posting. – Prune Feb 20 '19 at 21:08
  • I was able to reduce your loop on 10^5 from 214 seconds to 19 seconds (on repl.it, see my answer :). – גלעד ברקן Feb 23 '19 at 06:52
  • I don't need to run it; I've used a sibling of that algorithm on Project Euler. I upvoted; good work! – Prune Feb 25 '19 at 17:01
3

No time to properly test it, but seemed to yield the same results as your code (at 100 -> 6 results and at 1000 -> 55 results).

With N=1000 a time of 2ms vs your 144ms also without List

and N=10000 a time of 28ms

var N = 1000;
var c = 0;

for (int x = 2; x < N; x+=2)
{
    for (int y = x; y < (N - x); y+=2)
    {
        long z2 = x * x + y * y + 1;
        int z = (int) Math.Sqrt(z2);
        if (x + y + z > N)
            break;
        if (z * z == z2)
            c++;
    }
}

Console.WriteLine(c);
NotFound
  • 5,005
  • 2
  • 13
  • 33
  • @Dukeling ye I was thinking the same, but went with it because Google gave me an improper result. I'll update the awnser – NotFound Feb 21 '19 at 13:33
  • Would you mind if I update my answer with your latest optimization and give you credit for it ? I get 9 seconds on your solution with N=100000 and 2.1 seconds if Parallel is used, though I get 1.3 seconds on mine. I'm not sure if your bounds are more efficient than mine but I think your solution might be the best for like the lower 50% of N's range. – Mat Feb 21 '19 at 15:35
  • @Mat sure. I also changed the calculation to check for a perfect square root based on @Dukeling feedback. Some very large numbers you seemingly get improper results back (e.g. `3999680306388005621` [source](https://stackoverflow.com/questions/4885925/shortest-way-to-check-perfect-square)). – NotFound Feb 22 '19 at 08:19
  • Another important detail you should also add to yours is the use of `long`. You'll get wrong results for higher values of `N` without having an error, due to the `int` overflowing back to negatives. – Mat Feb 22 '19 at 19:13
3

The bounds of x and y are an important part of the problem. I personally went with this Wolfram Alpha query and checked the exact forms of the variables.

Thanks to @Bleep-Bloop and comments, a very elegant bound optimization was found, which is x < n and x <= y < n - x. The results are the same and the times are nearly identical.

Also, since the only possible values for x and y are positive even integers, we can reduce the amount of loop iterations by half.

To optimize even further, since we compute the upper bound of x, we build a list of all possible values for x and make the computation parallel. That saves a massive amount of time on higher values of N but it's a bit slower for smaller values because of the overhead of the parallelization.

Here's the final code:

Non-parallel version, with int values:

List<string> res = new List<string>();
int n2 = n * n;

double maxX = 0.5 * (2.0 * n - Math.Sqrt(2) * Math.Sqrt(n2 + 1));

for (int x = 2; x < maxX; x += 2)
{
    int maxY = (int)Math.Floor((n2 - 2.0 * n * x - 1.0) / (2.0 * n - 2.0 * x));

    for (int y = x; y <= maxY; y += 2)
    {
        int z2 = x * x + y * y + 1;
        int z = (int)Math.Sqrt(z2);

        if (z * z == z2 && x + y + z <= n)
            res.Add(x + "," + y + "," + z);
    }
}

Parallel version, with long values:

using System.Linq;

...

// Use ConcurrentBag for thread safety
ConcurrentBag<string> res = new ConcurrentBag<string>();
long n2 = n * n;

double maxX = 0.5 * (2.0 * n - Math.Sqrt(2) * Math.Sqrt(n2 + 1L));

// Build list to parallelize
int nbX = Convert.ToInt32(maxX);
List<int> xList = new List<int>();
for (int x = 2; x < maxX; x += 2)
    xList.Add(x);

Parallel.ForEach(xList, x =>
{
    int maxY = (int)Math.Floor((n2 - 2.0 * n * x - 1.0) / (2.0 * n - 2.0 * x));

    for (long y = x; y <= maxY; y += 2)
    {
        long z2 = x * x + y * y + 1L;
        long z = (long)Math.Sqrt(z2);

        if (z * z == z2 && x + y + z <= n)
            res.Add(x + "," + y + "," + z);
    }
});

When ran individually on a i5-8400 CPU, I get these results:

N: 10; Solutions: 1; Time elapsed: 0.03 ms (Not parallel, int)

N: 100; Solutions: 6; Time elapsed: 0.05 ms (Not parallel, int)

N: 1000; Solutions: 55; Time elapsed: 0.3 ms (Not parallel, int)

N: 10000; Solutions: 543; Time elapsed: 13.1 ms (Not parallel, int)

N: 100000; Solutions: 5512; Time elapsed: 849.4 ms (Parallel, long)


You must use long when N is greater than 36340, because when it's squared, it overflows an int's max value. Finally, the parallel version starts to get better than the simple one when N is around 23000, with ints.

Mat
  • 1,440
  • 1
  • 18
  • 38
  • I'm not sure wolphram alpha expansion and factoring is 100% reliable. I once encountered a mistake. – גלעד ברקן Feb 22 '19 at 19:37
  • @גלעדברקן It's not always 100% correct, indeed. But for this problem, I've been having the correct results and lower times, so I assumed it was ok. Feel free to let me (us) know if you find better bounds and/or solution ! – Mat Feb 22 '19 at 19:42
2
#include<iostream>
#include<math.h>
int main()
{
    int N = 10000;
    int c = 0;
    for (int x = 2; x < N; x+=2)
    {
        for (int y = x; y < (N - x); y+=2)
        {
            auto z = sqrt(x * x + y * y + 1);
            if(x+y+z>N){
                break;
            }
            if (z - (int) z == 0)
            {
                c++;
            }
        }
    }
    std::cout<<c;
}

This is my solution. On testing the previous solutions for this problem I found that x,y are always even and z is odd. I dont know the mathematical nature behind this, I am currently trying to figure that out.

Justice_Lords
  • 949
  • 5
  • 14
2

I want to get it done in C# and it should be covering all the test cases based on condition provided in the question.

The basic code, converted to long to process the N <= 100000 upper limit, with every optimizaion thrown in I could. I used alternate forms from @Mat's (+1) Wolfram Alpha query to precompute as much as possible. I also did a minimal perfect square test to avoid millions of sqrt() calls at the upper limit:

public static void Main()
{
    int c = 0;

    long N = long.Parse(Console.ReadLine());
    long N_squared = N * N;

    double half_N_squared = N_squared / 2.0 - 0.5;
    double x_limit = N - Math.Sqrt(2) / 2.0 * Math.Sqrt(N_squared + 1);

    for (long x = 2; x < x_limit; x += 2)
    {
        long x_squared = x * x + 1;

        double y_limit = (half_N_squared - N * x) / (N - x);

        for (long y = x; y < y_limit; y += 2)
        {
            long z_squared = x_squared + y * y;
            int digit = (int) z_squared % 10;

            if (digit == 3 || digit == 7)
            {
                continue;  // minimalist non-perfect square elimination
            }

            long z = (long) Math.Sqrt(z_squared);

            if (z * z == z_squared)
            {
                c++;
            }
        }
    }

    Console.WriteLine(c);
}

I followed the trend and left out "the degenerate solution" as implied by the OP's code but not explicitly stated.

cdlane
  • 40,441
  • 5
  • 32
  • 81