5

I am looking for the more efficient and shortest way of performing the square root of a sum of squares of two or more numbers. I am actually using numpy and this code:

np.sqrt(i**2+j**2)

That seems five time faster than:

np.sqrt(sum(np.square([i,j])))

(i and j are to numbers!)

I was wondering if there was already a built-in function more efficient to perform this very common task with even less code.

G M
  • 20,759
  • 10
  • 81
  • 84
  • 2
    I would go with `numpy` all the way. Something like `np.sqrt(np.sum(a*a))`, where `a` is your array of numbers. – Thomas Kühn Jun 28 '18 at 07:48
  • 7
    Likely [`numpy.linalg.norm`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.norm.html) is the most efficient implementation. See also [this answer which looks in detail at the performance](https://stackoverflow.com/a/47542304/7469434). – IonicSolutions Jun 28 '18 at 07:51
  • If you are searching for the shortest methode use np.linalg.norm. For best performance you can make use of Cython,Numba or numexpr. eg. https://stackoverflow.com/a/49868544/4045774 On larger arrays, this problem can also be parallelized easily. – max9111 Jun 28 '18 at 09:45
  • Are you using this operation iterating though a number of points? – Divakar Jun 28 '18 at 10:24
  • 1
    @IonicSolutions it seems to the simple case here it's faster `(i*i + j*j)**0.5` – G M Jun 28 '18 at 12:44
  • For single numbers, that's likely true, but you should really take a look at Felix's answer below. – IonicSolutions Jun 28 '18 at 16:17

5 Answers5

5

For the case of i != j it is not possible to do this with np.linalg.norm, thus I recommend the following:

(i*i + j*j)**0.5

If i and j are single floats, this is about 5 times faster than np.sqrt(i**2+j**2). If i and j are numpy arrays, this is about 20% faster (due to replacing the square with i*i and j*j. If you do not replace the squares, the performance is equal to np.sqrt(i**2+j**2).
Some timings using single floats:

i = 23.7
j = 7.5e7
%timeit np.sqrt(i**2 + j**2)
# 1.63 µs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit (i*i + j*j)**0.5
# 336 ns ± 7.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit math.sqrt(i*i + j*j)
# 321 ns ± 8.21 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

math.sqrt is slightly faster than (i*i + j*j)**0.5, but this comes at the cost of losing flexibility: (i*i + j*j)**0.5 will work on single floats AND arrays, whereas math.sqrt will only work on scalars.

And some timings for medium-sized arrays:

i = np.random.rand(100000)
j = np.random.rand(100000)
%timeit np.sqrt(i**2 + j**2)
# 1.45 ms ± 314 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit (i*i + j*j)**0.5
# 1.21 ms ± 78.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
JE_Muc
  • 5,403
  • 2
  • 26
  • 41
  • I will accept it as when there will be no other answers to evaluate no pressure, please! – G M Jun 28 '18 at 10:22
  • Sure, just wanted to know, if there is anything else I could try to implement in my answer. – JE_Muc Jun 28 '18 at 11:12
  • In my own testing, `math.sqrt()` is faster than `np.sqrt()`, `x**0.5`, `pow(x,0.5)`, and `math.pow(x,0.5)`. It becomes even faster if you import it as `from math import sqrt` and call `sqrt()` directly. – Dillon Davis Jun 28 '18 at 21:29
  • @DillonDavis But `math.sqrt()` can't take arrays as inputs, thus you are losing alot of flexibility with it. For having a complete answer, I added a case with `math.sqrt` to my answer. But importing it with `from math import sqrt` should not have an impact on the performance! There must have been a irregularity in your timings. – JE_Muc Jun 28 '18 at 21:48
  • 1
    GM said in an earlier comment i and j were integers, not arrays. That's why I've done my timing with just ints in a for loop. In regards to the import discrepancy, its because math.sqrt has to perform two lookups (math, and sqrt), whereas the other only has one. See [this answer](https://stackoverflow.com/a/3592137/6221024) for details. – Dillon Davis Jun 28 '18 at 21:58
  • @DillonDavis Yeah, true. But since this is python, I'd recommend sticking with the higher flexibility instead of a tiny performance gain. Oh and thanks for that link. That was new to me. Sorry for the false information in my last comment. – JE_Muc Jun 28 '18 at 22:19
  • However it is possible to do it with `np.linalg.norm` – G M Jun 29 '18 at 10:56
3

Instead of optimizing this fairly simple function call, you could try to rewrite your program such that i and j are arrays instead of single numbers (assuming that you need to call the function on a lot of different inputs). See this small benchmark:

import numpy as np
i = np.arange(10000)
j = np.arange(10000)

%%timeit 
np.sqrt(i**2+j**2)
# 74.1 µs ± 2.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%%timeit
for idx in range(len(i)):
    np.sqrt(i[idx]**2+j[idx]**2)
# 25.2 ms ± 1.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

As you can see, the first variant (using arrays of numbers as input) is ~300x faster than the second one using a python for loop. The reason for this is that in the first example, all computation is performed by numpy (which is implemented in c internally and therefore really fast), whereas in the second example, numpy code and regular python code (the for loop) interleave, making the execution much slower.

If you really want to improve the performance of your program, I'd suggest rewriting it so that you can perform your function once on two numpy arrays instead of calling it for each pair of numbers.

Felix
  • 6,131
  • 4
  • 24
  • 44
2

I understand you need speed, but I would like to point to some faults of writing own sqroot calculator

Speed comparison

%%timeit
math.hypot(i, j)
# 85.2 ns ± 1.03 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%%timeit
np.hypot(i, j)
# 1.29 µs ± 13.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%%timeit
np.sqrt(i**2+j**2)
# 1.3 µs ± 9.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%%timeit
(i*i + j*j)**0.5
# 94 ns ± 1.61 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Speed wise both numpy are same, but hypot is very safe. As matter of fact (i*i + j*j)**0.5 overflows. hypot is efficient is sense of accuracy :p

Also math.hypot is also very safe and fast also and can handle 3d sqrt of sum of sqrs and faster than (i*i + j*j)**0.5

Underflow

i, j = 1e-200, 1e-200
np.sqrt(i**2+j**2)
# 0.0

Overflow

i, j = 1e+200, 1e+200
np.sqrt(i**2+j**2)
# inf

No Underflow

i, j = 1e-200, 1e-200
np.hypot(i, j)
# 1.414213562373095e-200

No Overflow

i, j = 1e+200, 1e+200
np.hypot(i, j)
# 1.414213562373095e+200
eroot163pi
  • 1,791
  • 1
  • 11
  • 23
1

In this case numexpr module may be faster. This module avoids intermediate buffering and thus is faster for certain operations:

i = np.random.rand(100000)
j = np.random.rand(100000)
%timeit np.sqrt(i**2 + j**2)
# 1.34 ms

import numexpr as ne
%timeit ne.evaluate('sqrt(i**2+j**2)')
#370 us
Brenlla
  • 1,471
  • 1
  • 11
  • 23
  • Thanks but it's slower compared to `(i*i + j*j)**0.5` – G M Jun 28 '18 at 12:38
  • That's going to depend on the size of your array. For this operation, `numexpr` becomes faster with arrays larger than 10,000-100,000 – Brenlla Jun 28 '18 at 12:46
  • Thanks for the clarification but in my question i and j are two numbers – G M Jun 28 '18 at 12:54
  • If using other modules than `numpy`, I highly recommend `numba`. In most cases this will be far superior to `numexpr`. – JE_Muc Jun 28 '18 at 21:51
1

I did some comparisons based on the answers it seems that the faster way is to use math module and then math.hypot(i + j) but probably the best compromise is to use (i*i + j*j)**0.5 without importing any module even though not so explicit.

enter image description here

Code

from timeit import timeit
import matplotlib.pyplot as plt


tests = [
"np.sqrt(i**2+j**2)",
"np.sqrt(sum(np.square([i,j])))",
"(i*i + j*j)**0.5",
"math.sqrt(i*i + j*j)",
"math.hypot(i,j)",
"np.linalg.norm([i,j])",
"ne.evaluate('sqrt(i**2+j**2)')",
"np.hypot(i,j)"]

results = []
lengths = []
for test in tests:
    results.append(timeit(test,setup='i = 7; j = 4;\
                          import numpy  as np; \
                          import math; \
                          import numexpr as ne', number=1000000))
    lengths.append(len(test))

indx = range(len(results))
plt.bar(indx,results)
plt.xticks(indx,tests,rotation=90)
plt.yscale('log')
plt.ylabel('Time (us)')
G M
  • 20,759
  • 10
  • 81
  • 84
  • Just curious, if `i` and `j` are just numbers, why do you care about the performance of one operation that takes about 1 us? – Brenlla Jun 29 '18 at 11:31
  • @Brenlla your point it's right but I just like to do the optimal procedure as I wrote `(i*i + j*j)**0.5` even though is not faster is the more suited for the task. – G M Jun 29 '18 at 12:27