Elementary function math opereations for Rust ndarray arrays

Question

I simply would like to do elementary math operations (e.g., sin, exp, log, sqrt ...) for Rust ndarray. However, I did not find any useful example for doing so from reading ndarray's documentations.

Say, for example:

extern crate ndarray;

use ndarray as nd;

fn main() {
    let matrix = nd::array![[1., 2., 3.], [9., 8., 7.]];
    let result = some_math(matrix);
    println!("{}", result)
}

fn some_math(...) {
    //Here I would like to do elementwise exp() and sqrt
    sqrt(exp(...))
    // Using f64::exp would fail.
}

How to implement such some_math efficiently? I can of course do the elementwise operations by looping over matrix's elements, but this doesn't sound pretty, I prefer not to do so.

In numpy of python, this simply is np.sqrt(np.exp(matrix)). I mean Rust is an awesome language indeed however, it is really inconvenient (lacks proper ecosystem) to do even simple algebra.

UPDATE: There is an on-going pull request of ndarray. If this is accepted, then you can simply do matrix.exp().sqrt() etc.

There is a very hidden page in ndarray-doc telling how to do such math operations.

Some related questions: 1 2

How about using [`mapv_into()`](https://docs.rs/ndarray/0.15.3/ndarray/struct.ArrayBase.html#method.mapv_into)? — Sven Marnach, Nov 16 '21 at 10:49
@SvenMarnach Looks like a solution. Is it supposed to be the "default" way? — null, Nov 16 '21 at 10:55
I'm not sure, but it looks reasonable. If you want a new array rather than changing the existing array in place, there's also the `mapv()` method. Also note that there's nothing inherently wrong with iterating over the array yourself. In Python, that would be devastating for performance, but in Rust it isn't, since in the end that's what needs to happen anyway. — Sven Marnach, Nov 16 '21 at 12:24
@SvenMarnach Would you like to elaborate an answer? In `numpy` or `numba` these operations are actually implemented parallel in their backend. I wonder if `mapv()` also does similar or multithread something? — null, Nov 16 '21 at 13:50
I don't think numpy runs the operations in multiple threads (though numba might). Judging by the traits required of the function passed to `mapv_into()` and `map_inplace()`, those are not multithreaded either - but there are `par_map_inplace()` and `par_mapv_inplace()`, which are. — user4815162342, Nov 16 '21 at 14:38
Take a look at the benchmark - if you expect numpy to squarely beat Rust ndarray in performance, you might be in for a surprise! — user4815162342, Nov 16 '21 at 21:11

user4815162342 · Accepted Answer · 2021-11-16T21:31:53.817

7

How to implement such some_math efficiently?

You can use mapv_into():

use ndarray as nd;
use ndarray::Array2;

fn some_math(matrix: Array2<f64>) -> Array2<f64> {
    // np.sqrt(np.exp(matrix)) would literally translate to equivalent to
    // matrix.mapv_into(f64::exp).mapv_into(f64::sqrt)
    // but this version iterates over the matrix just once
    matrix.mapv_into(|v| v.exp().sqrt())
}

fn main() {
    let matrix = nd::array![[1., 2., 3.], [9., 8., 7.]];
    let result = some_math(matrix);
    println!("{:?}", result)
}

Playground

That should give you performance comparable to that of numpy, but you should measure to be sure.

To use multiple cores, which makes sense for large arrays, you'd enable the rayon feature of the crate and use par_mapv_inplace():

fn some_math(mut matrix: Array2<f64>) -> Array2<f64> {
    matrix.par_mapv_inplace(|v| v.exp().sqrt());
    matrix
}

(Doesn't compile on the Playground because the Playground's ndarray doesn't include the rayon feature.)

Note that in the above examples you can replace v.exp().sqrt() with f64::sqrt(f64::exp(v)) if that feels more natural.

EDIT: I was curious about timnings, so I decided to do a trivial (and unscientific) benchmark - creating a random 10_000x10_000 array and comparing np.sqrt(np.sqrt(array)) with the Rust equivalent.

Python code used for benchmarking:

import numpy as np
import time

matrix = np.random.rand(10000, 10000)

t0 = time.time()
np.sqrt(np.exp(matrix))
t1 = time.time()

print(t1 - t0)

Rust code:

use std::time::Instant;
use ndarray::Array2;
use ndarray_rand::{RandomExt, rand_distr::Uniform};

fn main() {
    let matrix: Array2<f64> = Array2::random((10000, 10000), Uniform::new(0., 1.));
    let t0 = Instant::now();
    let _result = matrix.mapv_into(|v| v.exp().sqrt());
    let elapsed = t0.elapsed();
    println!("{}", elapsed.as_secs_f64());
}

In my experiment on my ancient desktop system, Python takes 3.7 s to calculate, whereas Rust takes 2.5 s. Replacing mapv_into() with par_mapv_inplace() makes Rust drastically faster, now clocking at 0.5 s, 7.4x faster than equivalent Python.

It makes sense that the single-threaded Rust version is faster, since it iterates over the entire array only once, whereas Python does it twice. If we remove the sqrt() operation, Python clocks at 2.8 s, while Rust is still slightly faster at 2.4 s (and still 0.5 s parallel). I'm not sure if it's possible to optimize the Python version without using something like numba. Indeed, the ability to tweak the code without suffering the performance penalty for doing low-level calculations manually is the benefit of a compiled language like Rust.

The multi-threaded version is something that I don't know how to replicate in Python, but someone who knows numba could do it and compare.

edited Nov 16 '21 at 21:31

answered Nov 16 '21 at 14:46

user4815162342

141,790
18
296
355

please pardon me for one more question: would `map` be more effcient than `mapv` for large arrays in practice? or which one should we use, if we can implement `some_math` both by passing value or reference. – null Nov 16 '21 at 14:51
@NathanExplosion I would expect them to be equally efficient, but you should measure on cases you care about to be certain. Note that I'm not an expert in the area, I chose the `v` versions just because they nicely fit with the signatures of `f64::exp` and `f64::sqrt`. It would be trivial to use the reference version and pass a closure like `|v| *v = v.exp()`. – user4815162342 Nov 16 '21 at 15:22
Also, regardless of the distinction between value/ref version, you probably want to perform only one pass over the array: `matrix.mapv_into(|v| v.exp().sqrt())`. – user4815162342 Nov 16 '21 at 15:26
Performance depends on many factors. If in doubt, you need to measure. Before putting in the effort, make sure this part of the code is actually your bottleneck. My guess is that there isn't much of a difference if your matrix entries are basic numerical types. I'd personally use `mapv()`, since it seems more natural to me. – Sven Marnach Nov 16 '21 at 16:05
Compared to your numpy-solution (900 ms) this can also be easily performed using numexpr. `numexpr as ne; res=ne.evaluate('sqrt(exp(matrix))') ->200ms` or inplace `ne.evaluate('sqrt(exp(matrix))',out=matrix) -> 110ms` – max9111 Nov 22 '21 at 13:47
@max9111 That's a quite impressive speedup. Could you perhaps try the Rust versions - serial and parallel - on the same hardware? (Code for parallel version [here](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f68ebe87e30c3aaa9d5a46717876d319), just be sure to request the `rayon` feature from `ndarray` in your `Cargo.toml` with `ndarray = {version = "0.15.3", features=["rayon"]}`.) – user4815162342 Nov 22 '21 at 15:39
I have absolutely no experience with Rust. A bit more closely to a real in place solution in Numpy would be np.sqrt(np.exp(matrix,out=matrix),out=matrix)` which results in 560ms. But there still exp is calculated for the whole array at first and than sqrt again for the whole array. A fair comparison is also not so easy (both numpy and numexpr can make use of Intel VML https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/vector-mathematical-functions/vm-mathematical-functions.html) – max9111 Nov 22 '21 at 16:09
@max9111 You don't need Rust experience, just copy-paste the code into a new project and `cargo run --release`. – user4815162342 Nov 22 '21 at 16:17
I included `ndarray-rand = "0.14"` otherwise I got compilation errors. Parallel runs in 104 ms, single threaded runs in 557ms. Looks to be quite the same as numexpr (which also runs in parallel) – max9111 Nov 22 '21 at 16:40
Thanks for the test. (Yes, you need `ndarray-rand` as well, but with just the default features.) It's curious that Rust/Rayon is not outperformed by numexpr, assuming the latter makes use of Intel VML. – user4815162342 Nov 22 '21 at 16:49
The example looks to be memory limited, at least at all cores. I tried `ne.evaluate('sqrt(cos(sin(sqrt(exp(matrix)))))',out=matrix)` -> 127ms and the corresponding rust version -> 631ms (both parallel) I also tried a Numba example on the original example which results in 193 ms, but with only one core where the speed is cpu limited. – max9111 Nov 22 '21 at 16:58
@max9111 That's interesting. What do you mean by memory-limited? As I understand it, both numexpr and Rust modify the matrix in-place, and both use all available cores. The nested mathematical operations depend on each other, so they can't be parallelized further. Why would Rust be as fast as numexpr or faster at calculating two operations, but 5x slower at 5? Could it be that this is where Intel VML somehow kicks in? – user4815162342 Nov 22 '21 at 17:03
Even a np.copy(matrix) takes 244ms, or without preallocation np.copyto(matrix,A) takes 75ms. If the calculation is to simple it's just getting the data from RAM doing some which is not the limiting part and write it back to RAM. Only if the calculation in between is the limiting part, one can see the difference in performance between the implementations of exp,sqrt,etc. I cannot be parallelized further, but both Rust and Numexpr (but not numpy) are performing the operation in chunks. (only one read/write to RAM for 2 or in the second example 5 calculations) – max9111 Nov 22 '21 at 17:16
@max9111 Thanks, I now understand what you meant by memory-limited. With the higher number of operations, I guess one can tell how much faster numexpr's math functions are compared to those of Rust. **Or**, how better numexpr is at chunking/parallelizing them. In either case, kudos to numexpr. – user4815162342 Nov 22 '21 at 18:17
There should be a possibility to do the same in Rust, at least it looks like that including vml https://docs.rs/intel-mkl-tool/0.2.0+mkl2020.1/intel_mkl_tool/ (which numexpr can make use of) or some discussion including SVML https://github.com/rust-lang/rust/issues/53346 (which Numba can make use of). – max9111 Nov 22 '21 at 18:36
@max9111 It indeed looks possible, but properly connecting `ndarray` and VML sounds might be non-trivial. Whipping up some unsafe to execute VML operations across ndarray vectors is probably easy, but I suspect it would be quite far from idiomatic rust/ndarray code. – user4815162342 Nov 22 '21 at 18:57

Elementary function math opereations for Rust ndarray arrays

1 Answers1