7

I wrote a script in python, and it surprised me. Basically, it takes five 20 digit numbers, multiplies them and then raises them to the power of 3000. The timeit module is used to find the time required to compute the calculation. Well, when I run this script, it says it took 3*10^-7 seconds to compute it. It then generates a file, output.txt, but the script doesn't end until about 15 seconds later.

import timeit
outputFile = open("output.txt", "w")
start = timeit.default_timer()
x = (87459837581209463928*23745987364728194857*27385647593847564738*10293769154925693856*12345678901234567891)**3000
stop = timeit.default_timer()
time = stop-start
print "Time taken for the calculation was {} seconds".format(time)
outputFile.writelines(str(x))
outputFile.close()
y = raw_input("Press enter to exit.")

Does this mean that it actually takes a longer time to print a 280kb file than to perform the calculation?(I find it unlikely.)

If not so, then does python execute the calculation when the variable x is called upon? Will it execute the calculation every time the variable is calculated, or will it store the actual value in the variable?

I have just written another script, which confirms that it takes python 0.03 seconds to write the result to a .txt file. So, why does python execute the calculations later?

jfs
  • 399,953
  • 195
  • 994
  • 1,670
Aayush Mahajan
  • 3,856
  • 6
  • 25
  • 32
  • 5
    I/O operations (like printing) are generally slower than most of calculations. – prq Mar 30 '14 at 17:51
  • 2
    Especially when what you're printing is a number about 300,000 digits long. – BrenBarn Mar 30 '14 at 17:53
  • 280kb? Pfft, that's nothing right? Well, it is. Quite a lot actually. – anon582847382 Mar 30 '14 at 18:01
  • http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html – Thomas Mar 30 '14 at 18:02
  • You can try the same thing just printing to the IDLE console. It will become veey apparent that the problem is printing and not the calculation if you comment out the print line and compare the speeds. It's a pretty well known problem in the python world. It also slows down the rest of your computers tasks sometimes if the IDLE console gets a lot of text in it. – chase Mar 30 '14 at 18:05
  • You have a user prompt at the end of the script. Are you sure you're timing this correctly? – Alastair McCormack Mar 30 '14 at 18:07

3 Answers3

12

It's not the calculation that's the problem, nor is it writing to file: the vast bulk of the time is consumed by converting the result from its internal binary representation to a base-10 representation. That takes time quadratic in the number of bits, and you have a lot of bits here.

If you replace your output line with:

outputFile.writelines(hex(x))

you'll see that it runs very much faster. Converting to a hex representation only takes time linear in the number of bits.

If you really need to output giant integers in base 10 representation, look into using the decimal module instead. That does computations internally in a representation related to base 10, and then conversion to a decimal string takes time linear in the number of decimal digits. You'll need to set the decimal context's precision to a "big enough" value in advance, though, to avoid losing lower-order digits to rounding.

Tim Peters
  • 67,464
  • 13
  • 126
  • 132
  • 1
    +1 for mentioning that it might be `O(ndigits**2)`. Could you elaborate why [`long_to_decimal_string()`](http://hg.python.org/cpython/file/3.4/Objects/longobject.c#l1579) is quadratic? (I see the corresponding comment in `PyLong_FromString()` but it is the opposite direction). – jfs Mar 30 '14 at 18:46
  • 2
    Because Python longs are stored internally in (essentially) base 2**15, and CPython does not make heroic efforts to use sub-quadratic base conversion algorithms between power-of-2 and power-of-10 bases. It uses the more-or-less obvious (and quadratic time) algorithm instead ("one output digit at a time"). – Tim Peters Mar 30 '14 at 19:01
  • 1
    related: google search shows that [a sub-quadratic division-free algorithm is possible](http://hal.inria.fr/docs/00/86/42/93/PDF/get_str.pdf) – jfs Mar 30 '14 at 19:27
  • 1
    About using decimal: make sure you're either using Python >= 3.3, or using the cdecimal package from PyPI. Otherwise, computations are still done in binary, and the binary <-> decimal conversions dominate the operation time for large numbers. (So in Python 2.7, a Decimal addition takes time quadratic in the number of digits!) – Mark Dickinson Apr 07 '14 at 09:47
2

In addition to other answers, use outputFile.write(str(x)) instead of writelines. writelines is meant to be used with a sequence of strings. In your case, it iterates the string and writes each character individually. In a simple test, writelines was 3.7 times slower:

>>> timeit("f.writelines(str(s))", setup="f=open('tmp.txt','w');s=range(1000)", number=10000)
4.935087700632465
>>> timeit("f.write(str(s))", setup="f=open('tmp.txt','w');s=range(1000)", number=10000)
1.3468097837871085
tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • +1. On my machine it takes 0.0184 μs to compute the number, 3500000 μs to convert it to string (`str(x)`), 114 μs to write it to a file (`.write()`), 10200 μs with `.writelines()`. In other words, `str(x)` dominates everything else – jfs Mar 31 '14 at 17:46
  • correction: it takes 50000 μs to compute the number (It seems that Python compiler computes some literals i.e., the previous result is incorrect if any of the numbers is a variable) – jfs Mar 31 '14 at 19:41
  • @J.F.Sebastian, that's what the "in addition to..." part was all about. Its not the primary slowdown, but should fixed as well. – tdelaney Mar 31 '14 at 21:06
  • I know that, that is why `+1` above. – jfs Apr 01 '14 at 04:13
1

It is conversion to string that takes so long:

In [68]: %time x = (87459837581209463928*23745987364728194857*27385647593847564738*10293769154925693856*12345678901234567891)**3000
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s

In [69]: %time xs = str(x)
CPU times: user 1.98 s, sys: 0.00 s, total: 1.98 s
Wall time: 1.98 s

In [71]: %time print xs
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 0.04 s

But it should not be surprising with number that has hundreds of thousands digits.

EDIT

Contrary to other answers, writing to file does not take that much time:

In [72]: %time with open('tmp.file', 'w') as f: f.write(xs)
CPU times: user 0.00 s, sys: 0.01 s, total: 0.01 s
Wall time: 0.00 s
m.wasowski
  • 6,329
  • 1
  • 23
  • 30