2

Given a number M and a list A which contains N elements (A1, A2,...) Find the all the numbers k so that: 1=<k=<M which satisfied gcd(Ai, k) is always equal to 1

Here's my code, the only problem for it is that it uses loops in each other, which slow the process if my inputs are big, how can I fix it so that it requires less time?

N, M = [int(v) for v in input().split()]
A = [int(v) for v in input().split()]
from math import gcd
cnt = 0
print(N)
for k in range(1, M+1):
    for i in range(N):
        if gcd(k, A[i]) == 1:
            cnt += 1
            if cnt == N:
                print(k)
    cnt = 0

inputs example: (first line contains N and M, second contains the list A1, A2,...)

3 12
6 1 5
Codeer
  • 82
  • 9
  • "my app still crash " - share the respective error you obtain and what's "big" input for that? – RomanPerekhrest Feb 05 '23 at 14:10
  • Please check out my answer in order to eventually reconsider which of the answers addresses best the speed issue or to point out why the timing tests don't show the speed advantage of the in the accepted answer used numpy approach. – Claudio Feb 07 '23 at 23:03
  • What are the limits for M and N? Is the problem online somewhere for testing? – Kelly Bundy Feb 08 '23 at 10:27

4 Answers4

4

Just another approach using sympy.theory, factorint and Python sets which from the point of view of speed has on my machine no advantage compared to the math.lcm() or the math.gcd() based solutions if applied to small sizes of lists and numbers, but excels at very large size of randomized list:

M    = 12
lstA = (6, 1, 5)
from sympy.ntheory import factorint
lstAfactors = []
for a in lstA:
   lstAfactors += factorint(a)
setA = set(lstAfactors)
for k in range(1, M+1):
    if not (set(factorint(k)) & setA):
        print(k)

The code above implements the idea described in the answer of Yatisi coded by Tom Karzes using math.gcd(), but is using sympy.ntheory factorint() and set() instead of math gcd(). In terms of speed the factorint() solution seems to be fastest on the below tested data:

# ======================================================================
from time   import perf_counter as T
from math   import gcd, lcm
from sympy  import factorint
from random import choice
#M    = 3000
#lstA = 100 * [6, 12, 18, 121, 256, 1024, 361, 2123, 39]
M    = 8000
lstA = [ choice(range(1,8000)) for _ in range(8000) ]

# ----------------------------------------------------------------------
from sympy.ntheory import factorint
lstResults  = []
lstAfactors = []
sT=T()
for a in lstA:
   lstAfactors += factorint(a)
setA = set(lstAfactors)
for k in range(1, M+1):
    if not (set(factorint(k)) & setA):
        lstResults += [k]
print("factorint:", T()-sT)
#print(lstResults)
print("---")

# ----------------------------------------------------------------------
lstResults = []
sT=T()
#l = 1
#for a in lstA:
#    l = (l*a)//gcd(l, a) # can be replaced by: 
l = lcm(*lstA) # least common multiple divisible by all lstA items
# ^-- which runs MAYBE a bit faster than the loop with gcd()
for k in range(1, M+1):
    if gcd(l, k) == 1:
        lstResults += [k]
print("lcm()    :", T()-sT)
#print(lstResults)
print("---")
# ----------------------------------------------------------------------
lstResults = []
sT=T()
l = 1
for a in lstA:
    l = (l*a)//gcd(l, a) # can be replaced by: 
#l = lcm(*lstA) # least common multiple divisible by all lstA items
# ^-- which runs MAYBE a bit faster than the loop with gcd()
for k in range(1, M+1):
    if gcd(l, k) == 1:
        lstResults += [k]
print("gcd()    :", T()-sT)
#print(lstResults)
print("---")
# ----------------------------------------------------------------------
import numpy as np
A = np.array(lstA)
def find_gcd_np(M, A, to_gcd=1):
    vals = np.arange(1, M + 1)
    return vals[np.all(np.gcd(vals, np.array(A)[:, None]) == to_gcd, axis=0)]
sT=T()
lstResults = find_gcd_np(M, A, 1).tolist()
print("numpy    :", T()-sT)
#print(lstResults)
print("---")

printing

factorint: 0.09754624799825251
---
lcm()    : 0.10102138598449528
---
gcd()    : 0.10236155497841537
---
numpy    : 6.923375226906501
---

The timing results change extremely for the second data variant in the code provided above printing:

factorint: 0.021642255946062505
---
lcm()    : 0.0010238440008834004
---
gcd()    : 0.0013772319070994854
---
numpy    : 0.19953695288859308
---

where the factorint based approach is 20x and the numpy based approach 200x times slower than the gcd/lcm based one.

Run the timing test yourself online. It won't run the case of large data, but it can at least demonstrate that the numpy approach is 100x times slower than the gcd one:

factorint: 0.03271647123619914
---
lcm()    : 0.003286922350525856
---
gcd()    : 0.0029655308462679386
---
numpy    : 0.41759901121258736

1 https://ato.pxeger.com/run?1=3VXBitswED0W9BXDGoq16-zaSbtsAzmk0GN6KLmUkAbFkdcismQkZbc-9Et62Uv7Uf2ajiwnNt1Du7ClUIORpXmaefNmLH39Xjeu1Orh4dvBFaObHy--RDB7locURlfgRMUBQFS1Ng5qbopNrg_KcQPMwjKAKubKHnSb7xKQeRVstqnq5mQrWO60EcoFo2Fqh0NnzEstck6iBfqCGUzSNCWRtG6OkyxN4RxW1wlkY3xv_JglMH7tV9LxqwQm136ejSf4-WZNhk46H6suQoxhb3mcJTdpSikU2sAGhIKw7BdhTUgEo2d5BjJcKldybZrHaiDDD9wepLOe9GrdUg5mGxbscraMKfFkmSfrAVOCOQ6RF7PeZ8wosbxNHId4AAte9n3KKNziIqOtOxAFKO0g9pt6Z3sU6qV3NO9gEEIfWWPk1X5N6hZ8dto3PUsAaY_skpIoGPtN9AhHlc7oMyo-4DXULpK-kUj0i4ZRmwqaYnnO6NUV9m8sE2AUIsiZgi0Hw2vJcr6DbTMF4rHY3_G53-9RkjOL7aurSiuoMK6oJYeduBNWbPFr2wCTsg0HwvHKYsxPoxHclyIvwRyUhcX849t3yGorfFtY_3-5EoNjw4DUuoZ7gf-Yp_bb6nX89xQPAsj-oFo-F-oRT6nW3y5WqNXjdn9SpaL_rVSt139Wqu7YUgd_pOPxr2rijxdVXzJjWNOeMZTseAGFULsNkt2oOl4kME_AaT-fHZO_Y9Iet56kgQvIaGs23B2MalErj5EyxsFn75eSPuScrqYJvNeKr1sRQxjsic_CzlLat9Owyx6zy-il01JYF5-0C1k-UepwC3eX8fFS_gk)

Claudio
  • 7,474
  • 3
  • 18
  • 48
3

Here's a fast version that eliminates the nested loops:

N, M = [int(v) for v in input().split()]
A = [int(v) for v in input().split()]
from math import gcd
print(N)

l = 1
for v in A:
    l = l*v//gcd(l, v)

for k in range(1, M+1):
    if gcd(l, k) == 1:
        print(k)

It works by first taking the LCM, l, of the values in A. It then suffices to check if the GCD of k and l is 1, which means there are no common factors with any of the values in A.

Note: If you're using a newer version of Python than I am (3.9 or later), you can import lcm from math and replace l = l*v//gcd(l, v) with l = lcm(l, v).

Or, as Kelly Bundy pointed out, lcm accepts an arbitrary number of arguments, so the first loop can be replaced with l = lcm(*A) if you're using 3.9 or later.

Tom Karzes
  • 22,815
  • 2
  • 22
  • 41
2

This is probably more a math question than a programming question, however, here comes my take: Depending on M and A, it might be better to

  1. Find the prime divisors of the Ai (have a look at this) and put them in a set.
  2. Either remove (sieve) all multiples of these primes from list(range(1,M+1)), which you can do (more) efficiently by smart ordering, or find all primes smaller or equal to M (which could even be pre-computed) that are not divisors of any Ai and compute all multiples up to M.

Explanation: Since gcd(Ai,k)=1 if and only if Ai and k have no common divisors, they also have no prime divisors. Thus, we can first find all prime divisors of the Ai and then make sure our k don't have any of them as divisors, too.

Yatisi
  • 21
  • 2
1

Using numpy with vectorised operations will be a good alternative when your input range M goes up to hundreds and higher and A is stably small (is about as your current A):

import numpy as np

def find_gcd_np(M, A, to_gcd=1):
    vals = np.arange(1, M + 1)
    return vals[np.all(np.gcd(vals, np.array(A)[:, None]) == to_gcd, axis=0)]

Usage:

print(find_gcd_np(100, [6, 1, 5], 1))
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • On my machine the numpy version for M=8000 and A=[ random.choice(range(1,8000)) for _ in range(8000)] runs about **70x slower** (7 s vs. 0.1 s) than the other solutions. Have you tested it yourself or are guessing that *"Using numpy ecosystem with vectorised operations will give a speed advantage"*? – Claudio Feb 07 '23 at 21:41
  • @Claudio, 1) "than the other solutions" - I don't believe that `factorint` (from `sympy`) runs faster - it is the slowest. 2) "Have you tested" - tested many times. – RomanPerekhrest Feb 07 '23 at 21:45
  • Would you be so kind to provide code of your tests, so that I can reproduce what you are speaking about? Maybe it depends somehow on used M and A? Does testing with M and A values I have used give numpy a speed advantage? – Claudio Feb 07 '23 at 21:46
  • @Claudio, my PyCharm CE sadly reported me only "last 12 hours" of local history for the file I tested functions. Then, I was testing with `timeit` and perfplot (https://github.com/nschloe/perfplot) – RomanPerekhrest Feb 07 '23 at 22:12
  • See my updated answer for timing results confirming what I have stated in my first comment. To my own surprise the factorint based approach seem to be the fastest one for the used M, lstA values. – Claudio Feb 07 '23 at 22:17
  • @Claudio, the crucial words in your comment are "... for the used M, lstA values". Let's say you find some limits ratio. For the limits you've chosen `M = 8000` and `lst as range(8000)` - `factorint` goes faster. But mind you that I mentioned in my answer's description `M` range. And if keeping `lstA` up to `100` and increasing `M` up to tens of thousands - numpy would go faster. For ex. for `M = 20_000` and `lstA = [ choice(range(1,100)) for _ in range(100) ]` I get timings: `find_gcd_sympy 8.160126730999764`, `find_gcd_np 6.9157218289983575` (with `timeit`) – RomanPerekhrest Feb 07 '23 at 23:06
  • I can confirm that numpy beats sympy on this example being 1.3x faster, but ... the gcd based solution still goes on the given data example 17x faster than numpy. I suppose numpy can't beat the gcd based approach - have you a data example for which this would be not the case? – Claudio Feb 07 '23 at 23:26
  • @Claudio, perhaps, `M = 500`, `A = [6, 1, 5]` (from the OPs case) – RomanPerekhrest Feb 07 '23 at 23:30
  • `M = 500, A = [6, 1, 5]` gives `gcd() : 0.000127, numpy : 0.000154` . So gcd/lcm wins also on this example against numpy. Have you run my timing test? You say "perhaps". It seems your claims are not backed up with evidence. numpy should actually excel at large A sizes, not M-values ... but somehow it doesn't even if it should. Maybe numpy gcd is slow? – Claudio Feb 07 '23 at 23:37
  • @Claudio, I've just tested on `M=500`, `lstA = [6,1,5]`. The timings: `find_gcd_sympy 1.5204836680022709 find_gcd_lcm 0.06267997100076173 find_gcd_np 0.041397550001420313`. And I'm on numpy `1.24.0`. That's enough for testing for me today – RomanPerekhrest Feb 07 '23 at 23:47
  • And I think all 3 approaches should stay in the collection of choices. – RomanPerekhrest Feb 07 '23 at 23:56
  • numpy 1.24.1 here, Python3.9, Linux Mint 18.4 ( timings 2x slower than on your machine if yours are in milliseconds). – Claudio Feb 07 '23 at 23:56
  • It seems that generally math.gcd is the way to go. With such small timings ... I have just got in the tenth attempt a timing where numpy wins, but ... timing on such small data sizes ... does it make any sense? – Claudio Feb 08 '23 at 00:01
  • 100x and more times slower on large data ... can't see what could be the reason for this? The claim about the speed advantage of numpy is really hard to back up with evidence. Maybe you consider to express it less generally to make it true? – Claudio Feb 08 '23 at 00:06
  • 1
    @Claudio, ok, you've made more measurements to find a "bottleneck" limits on `A` input. See my update on description. Generally, for most cases `find_gcd_lcm` is preferred. – RomanPerekhrest Feb 08 '23 at 09:21