25

This is quite an odd problem I know, but I'm trying to get a copy of the current largest prime number in a file. Getting the number in integer form is fairly easy. I just run this.

prime = 2**74207281 - 1

It takes about half a second and it works just fine. Operations are fairly quick as well. Dividing it by 10 (without decimals) to shift the digits is quick. However, str(prime) is taking a very long time. I reimplemented str like this, and found it was processing about a hundred or so digits per second.

while prime > 0:
    strprime += str(prime%10)
    prime //= 10

Is there a way to do this more efficiently? I'm doing this in Python. Should I even try this with Python, or is there a better tool for this?

Daffy
  • 841
  • 9
  • 23
  • Well, at 100 digits per second you should be finished in about 6 hours, so this solution seems feasible. Perhaps divide by 1000000 at a time and get 6 digits at once? – HugoRune Jan 21 '16 at 23:26
  • 2
    This solution will give you the digits in reverse order. – Frerich Raabe Jan 21 '16 at 23:32
  • @FrerichRaabe Thanks for letting me know, though I think it would be fairly trivial to reverse it again. – Daffy Jan 21 '16 at 23:48
  • @HugoRune Thanks! I'll try that – Daffy Jan 21 '16 at 23:48
  • 4
    I doubt that a naive Python implementation is going to be any faster than the `str` implemented on the C side (which is probably also pretty optimized). – Matteo Italia Jan 21 '16 at 23:49
  • If you have multiple cores, you could distribute the conversion. Efficiency would depend on knowing the representation python uses for big integers. – Jason Jan 21 '16 at 23:58
  • this number is so huge that probably I would write it in a file 100 or so digits at the time directly and worry about reverse the lines later, is 3 phone book long, 3!!! – Copperfield Jan 22 '16 at 01:02
  • 2
    @HugoRune A quick speed test shows that dividing by a huge number is faster than dividing by 10 by an order of magnitude. (Say, 10000000000000000000000000000000000000000000000000000000.) – user3757614 Jan 22 '16 at 01:22
  • 1
    If you printed it in base 16 it would not be such a hassle. – JDługosz Jan 22 '16 at 08:39
  • Repeated division by a fixed size power-of-ten has runtime quadratic in the length of the number you want to convert, even if you optimize string concatenation. – CodesInChaos Jan 22 '16 at 09:24
  • 2
    If you can use something other than Python, in the PARI/GP calculator (I use new 64-bit Windows version) you can say `write("primedigits.txt", 1<<74207281 - 1)` which will write everything to a file in 7 seconds (on my machine). Not sure how much time is spent on I/O. You may need to do `allocatemem(10^9)` (on a line by itself) before you calculate the number, if your default stack size is not huge enough. Using the notation `2^74207281 - 1` instead also gives a fast calculation. – Jeppe Stig Nielsen Jan 22 '16 at 12:15
  • `2**74207281` is about 22 million decimal digits. Doing this in portions of 10 each is indeed pretty slow. I assume that PARI/GP first divides by a power of ten of 11million digits, and then recursively breaks that down into half size portions until it becomes manageable and then assembles the string from these smaller portions. I guess that that is much faster. – Rudy Velthuis Jan 22 '16 at 12:43
  • Related: [Writing huge strings in python](http://stackoverflow.com/q/28420541/4279) – jfs Jan 22 '16 at 18:53
  • 1
    For a historical (circa Python 2.3) perspective on this question, you may want read the thread `How to get decimal form of largest known prime` at https://mail.python.org/pipermail/python-list/2004-June/239863.html – casevh Jan 22 '16 at 19:34

4 Answers4

16

Repeated string concatenation is notoriously inefficient since Python strings are immutable. I would go for

strprime = str(prime)

In my benchmarks, this is consistently the fastest solution. Here's my little benchmark program:

import decimal

def f1(x):
    ''' Definition by OP '''
    strprime = ""
    while x > 0:
        strprime += str(x%10)
        x //= 10
    return strprime

def digits(x):
    while x > 0:
        yield x % 10
        x //= 10

def f2(x):
    ''' Using string.join() to avoid repeated string concatenation '''
    return "".join((chr(48 + d) for d in digits(x)))

def f3(x):
    ''' Plain str() '''
    return str(x)

def f4(x):
    ''' Using Decimal class'''
    return decimal.Decimal(x).to_eng_string()

x = 2**100

if __name__ == '__main__':
    import timeit
    for i in range(1,5):
        funcName = "f" + str(i)
        print(funcName+ ": " + str(timeit.timeit(funcName + "(x)", setup="from __main__ import " + funcName + ", x")))

For me, this prints (using Python 2.7.10):

f1: 15.3430171013
f2: 20.8928260803
f3: 0.310356140137
f4: 2.80087995529
Frerich Raabe
  • 90,689
  • 19
  • 115
  • 207
  • Would appending lists be more efficient? – Daffy Jan 21 '16 at 23:56
  • Thanks for the benchmarks. I didn't know it was that much different. The reason I'm trying to avoid str() is that it doesn't give any progress. I guess I'll have to bite the bullet and go with str() then. – Daffy Jan 22 '16 at 00:22
  • @user1193112 PyPy gives better results using the above benchmark (`f1: 4.15663290024`, `f2: 7.74465799332`, `f3: 0.276544809341`, `f4: 0.298784971237`), so it may be worth trying. FYI, the number is roughly 22MiB in text though. – Jason Jan 22 '16 at 00:29
  • @Jason I forgot PyPy existed! Thanks! – Daffy Jan 22 '16 at 00:31
  • 5
    +1. But the slowness has nothing to do with the immutable nature of strings. (Recent versions of Python optimize repeated appends to a string. Old versions made one string object per each append, taking O(n**2) time!) The basic problem is that Python code runs about 50x slower than C code, so a loop coded in Python will always be slower than using a C built-in from Python. Using `str()` puts all the work on the built-in C code for Python and should always be the fastest way to do it in Python. – steveha Jan 22 '16 at 01:38
  • If you want progress, perhaps you can break your string into 1000-digit chunks? – Bill Michell Jan 22 '16 at 09:29
  • @steveha: a linear algorithm in pure Python may be faster than a quadratic algorithm in C (`n` is millions here — the asymptotic behavior is more important than a constant factor) – jfs Jan 22 '16 at 19:04
  • @J.F. Sebastian if Python ever builds in a quadratic algorithm that can be beaten by a native Python loop, I hope you will file a bug report. I sure will. – steveha Jan 23 '16 at 20:04
  • @steveha: Performance is not the only concern. Actually, Python already uses suboptimal for large `n` data structures that are more practical for smaller `n`. See [the quote from Python's Design Philosophy](http://stackoverflow.com/q/471544/4279). – jfs Jan 23 '16 at 20:54
14

Python's integer to string conversion algorithm uses a simplistic algorithm with a running of O(n**2). As the length of the number doubles, the conversion time quadruples.

Some simple tests on my computer show the increase in running time:

$ time py35 -c "n=str(2**1000000)"
user    0m1.808s
$ time py35 -c "n=str(2**2000000)"
user    0m7.128s
$ time py35 -c "n=str(2**4000000)"
user    0m28.444s
$ time py35 -c "n=str(2**8000000)"
user    1m54.164s

Since the actual exponent is about 10 times larger than my last test value, it should take about 100 times longer. Or just over 3 hours.

Can it be done faster? Yes. There are several methods that are faster.

Method 1

It is faster to divide the very large number by a power-of-10 into two roughly equal-sized but smaller numbers. The process is repeated until the numbers are relatively small. Then str() is used on each number and leading zeroes are used to pad the result to the same length as the last power-of-10. Then the strings are joined to form the final result. This method is used by the mpmath library and the documentation implies it should be about 3x faster.

Method 2

Python's integers are stored in binary format. Binary is great for calculations but binary-to-decimal conversion is the bottleneck. It is possible to define your own integer type that stores the value in blocks of 100 (or some similar value) decimal digits. Operations (exponentiation, multiplication, division) will be slower but conversion to a string will be very fast.

Many years ago, I implemented such a class and used efficient algorithms for multiplication and division. The code is no longer available on the Internet but I did find a backup copy that I tested. The running time was reduced to ~14 seconds.

Update

I updated the DecInt code referenced above and it is now available at https://github.com/casevh/DecInt.

If Python's native integer type is used, the total running time is less than 14 seconds on my computer. If gmpy2's integer type is used instead, the running time is ~3.5 seconds.

$ py35 DecInt.py
Calculating 2^74207281
Exponentiation time: 3.236
Conversion to decimal format: 0.304
Total elapsed time: 3.540
Length of result: 22338618 digits

Method 3

I maintain the gmpy2 library that provide easy access to the GMP library for fast integer arithmetic. GMP implements Method 1 in highly optimized C and assembly code and calculates the prime number and the string representation in ~5 seconds.

Method 4

The decimal module in Python stores values as decimal digits. Recent versions of Python 3 include a C implementation of the decimal library that is much faster that the pure-Python implementation include with Python 2. The C implementation run in just over 3 seconds on my computer.

from decimal import *
getcontext().prec = 23000000
getcontext().Emin = -999999999
getcontext().Emax = 999999999
x=Decimal(2)**74207281 - 1
s=str(x)
casevh
  • 11,093
  • 1
  • 24
  • 35
9

Took about 32 seconds to output the file using WinGhci (Haskell language):

import System.IO

main = writeFile "prime.txt" (show (2^74207281 - 1))

The file was 21 megabytes; the last four digits, 6351.

גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
4

There is gmp, the GNU Multiple Precision Arithmetic Library. It is particularly designed at handling huge numbers fast.

Dirk Herrmann
  • 5,550
  • 1
  • 21
  • 47
  • Does it convert from huge numbers to strings just as fast? The mathematics involved isn't difficult, it's the conversion from integer to string that's troubling me. – Daffy Jan 21 '16 at 23:57
  • I don't have any benchmarks on that - there are, however, functions for conversion: https://gmplib.org/manual/I_002fO-of-Integers.html#I_002fO-of-Integers – Dirk Herrmann Jan 22 '16 at 00:00
  • There's even a web page showing how someone compared and coupled it with python: http://jasonstitt.com/c-extension-n-choose-k – Dirk Herrmann Jan 22 '16 at 00:05
  • 4
    I just happened to have computed that prime with GMP. It took 5.5 seconds on my 2 GHz i7 to compute the prime, convert it to decimal, and write the decimal digits to a file. – Mark Adler Jan 22 '16 at 05:26