How to implement such some_math
efficiently?
You can use mapv_into()
:
use ndarray as nd;
use ndarray::Array2;
fn some_math(matrix: Array2<f64>) -> Array2<f64> {
// np.sqrt(np.exp(matrix)) would literally translate to equivalent to
// matrix.mapv_into(f64::exp).mapv_into(f64::sqrt)
// but this version iterates over the matrix just once
matrix.mapv_into(|v| v.exp().sqrt())
}
fn main() {
let matrix = nd::array![[1., 2., 3.], [9., 8., 7.]];
let result = some_math(matrix);
println!("{:?}", result)
}
Playground
That should give you performance comparable to that of numpy
, but you should measure to be sure.
To use multiple cores, which makes sense for large arrays, you'd enable the rayon
feature of the crate and use par_mapv_inplace()
:
fn some_math(mut matrix: Array2<f64>) -> Array2<f64> {
matrix.par_mapv_inplace(|v| v.exp().sqrt());
matrix
}
(Doesn't compile on the Playground because the Playground's ndarray
doesn't include the rayon
feature.)
Note that in the above examples you can replace v.exp().sqrt()
with f64::sqrt(f64::exp(v))
if that feels more natural.
EDIT: I was curious about timnings, so I decided to do a trivial (and unscientific) benchmark - creating a random 10_000x10_000 array and comparing np.sqrt(np.sqrt(array))
with the Rust equivalent.
Python code used for benchmarking:
import numpy as np
import time
matrix = np.random.rand(10000, 10000)
t0 = time.time()
np.sqrt(np.exp(matrix))
t1 = time.time()
print(t1 - t0)
Rust code:
use std::time::Instant;
use ndarray::Array2;
use ndarray_rand::{RandomExt, rand_distr::Uniform};
fn main() {
let matrix: Array2<f64> = Array2::random((10000, 10000), Uniform::new(0., 1.));
let t0 = Instant::now();
let _result = matrix.mapv_into(|v| v.exp().sqrt());
let elapsed = t0.elapsed();
println!("{}", elapsed.as_secs_f64());
}
In my experiment on my ancient desktop system, Python takes 3.7 s to calculate, whereas Rust takes 2.5 s. Replacing mapv_into()
with par_mapv_inplace()
makes Rust drastically faster, now clocking at 0.5 s, 7.4x faster than equivalent Python.
It makes sense that the single-threaded Rust version is faster, since it iterates over the entire array only once, whereas Python does it twice. If we remove the sqrt()
operation, Python clocks at 2.8 s, while Rust is still slightly faster at 2.4 s (and still 0.5 s parallel). I'm not sure if it's possible to optimize the Python version without using something like numba. Indeed, the ability to tweak the code without suffering the performance penalty for doing low-level calculations manually is the benefit of a compiled language like Rust.
The multi-threaded version is something that I don't know how to replicate in Python, but someone who knows numba could do it and compare.