3

I want to make a small optimization to a code that runs trillions of times. Which way of calculating the inverse of a float64 is faster?

So far, the ways I know of calculating the inverse are the following.

k = 3.788
result1 = k**-1
result2 = 1/k

I'd expect this question to be answered but searching whats faster 1 divide or power of -1 on google (and similarly worded) returned 0 results. No exaggeration.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
FemtoComm
  • 53
  • 1
  • 6
  • 3
    Did you try benchmarking it? – UnholySheep Feb 18 '19 at 22:49
  • 5
    If you're planning to do anything trillions of times, working on single floats one by one in Python is not going to cut it no matter how you microoptimize individual operations. At the very least, you're going to have to use NumPy. – user2357112 Feb 18 '19 at 22:50
  • 2
    Even if you optimize a single operation, loop overhead in Python is huge - you may get thousands of times' worth of speedup by using native code to do the important loops - NumPy is a good option. Or Cython, if you really want to get as optimal as possible. – Random Davis Feb 18 '19 at 22:52
  • 1
    Even just the Python opcode dispatch is going to be several orders of magnitude whatever speedup you may obtain by changing the way you perform your floating point operation. This kind of optimization is completely pointless in such a language. _Edit_: oh I've been beaten to it by @RandomDavis – Matteo Italia Feb 18 '19 at 22:54
  • Or numba. If this is in a regular loop, numpy isn't going to help. In fact, it will probably be slower. So numba or cython in that case – roganjosh Feb 18 '19 at 22:55
  • Also, you don't obtain any result for your Google query because the `-` of `-1` is intended by Google as "does not contain 1", so you are performing a query that asks for documents containing `1` and not containing it, hence the zero results. You have to add double quotes around the `-1` to make it interpret literally. – Matteo Italia Feb 18 '19 at 22:56
  • Possible duplicate of [Is there any simple way to benchmark python script?](https://stackoverflow.com/questions/1593019/is-there-any-simple-way-to-benchmark-python-script) – Paul Hankin Feb 18 '19 at 22:56
  • Thank you for your suggestions. I see I may have come off as lazy for not searching how to benchmark. I honestly thought it'd take more time to implement it than do a cursory search (and that should have been easy if I'd known of google's `-` operator.) Ended getting much more knowledge than I bargained for. – FemtoComm Feb 19 '19 at 22:03

1 Answers1

2

On a typical situation the best way to compute the inverse is as

y = 1.0 / k

This avoids any power computation (slower than a division) and any type conversion. Indeed, using IPython on my machine I have

In [1]: k = 3.788

In [2]: %%timeit
   ...: result1 = k**-1
57.8 ns ± 0.295 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [3]: %%timeit
   ...: result2 = 1/k
32 ns ± 0.197 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [4]: %%timeit
   ...: result3 = 1.0/k
24.1 ns ± 0.177 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Marco Lombardi
  • 545
  • 4
  • 18
  • Thank you for your answer! I'm also curious as to which one would be faster:`k**-1` or `k**-1.0`. I'd guess it depends on whether Python has two algorithms for integer exponents or not... – FemtoComm Feb 18 '19 at 23:06
  • Just for the fun of it, I ran the same benchmark in [Julia](https://julialang.org/); there (as of interpreter 1.0.1), `k ^ -1` is much faster (0.039ns median) than `1 / k` (1.899ns median). Of course, both those numbers blow the doors of the Python values above... perhaps the OP might think about Julia for their project, if they're running trillions of values. :) – Charles Duffy Feb 18 '19 at 23:12
  • 1
    @CharlesDuffy, you would need to run the same benchmark in `ipython` to get `python` timings that are at all comparable--your machine is not @Marco's machine. – jyalim Feb 18 '19 at 23:18
  • 1
    @HAL9001, on the same laptop where I collected the Julia numbers above, IPython 6.5.0 on Python 3.6 clocks 99.9ns for the first, 60ns for the second, 43ns for the third. So yes, my machine is not Marco's machine; it's *slower*, making the difference between Python and Julia even larger than it initially appeared. – Charles Duffy Feb 18 '19 at 23:22
  • @CharlesDuffy: I do not have Julia installed, but I would be curious to know what are your benchmarks if you do the same calculations with a very large array (using NumPy in Python, of course). – Marco Lombardi Feb 18 '19 at 23:29
  • @CharlesDuffy: I just tried to set `k = numpy.random.normal(size=1000000)` and repeated the tests above. Interestingly, all timing for me are now approximately equivalent, regardless of the method used to compute the inverse (500µs, that is 0.5ns per element). Presumably Julia is going to be faster than that. – Marco Lombardi Feb 18 '19 at 23:41
  • @CharlesDuffy: Thank you! Given the time you got earlier, I guess that approximately now Julia and Python + NumPy perform similarly. – Marco Lombardi Feb 18 '19 at 23:54
  • *nod* -- with NumPy's optimized C implementations and Julia's optimizing compiler, it makes sense that they'd both be close to underlying hardware performance. – Charles Duffy Feb 18 '19 at 23:56