3

I am writing an implementation of a neural network in Rust and am trying to calculate the dot product of two matrices. I have the following code:

fn dot_product(a: Vec<f64>, b: Vec<f64>) -> f64 {
    // Calculate the dot product of two vectors.
    assert_eq!(a.len(), b.len());
    let mut product: f64 = 0.0;
    for i in 0..a.len() {
        product += a[i] * b[i];
    }
    product
}

This takes two vectors, a and b (of the same length) and carries out element-wise multiplication (multiplying value 1 of vector a with value 1 of vector b and adding this to value 2 of vector a and with value 2 of vector b and so on...).

Is there a more efficient way of doing this, and if so, how?

user3840170
  • 26,597
  • 4
  • 30
  • 62
Teymour
  • 1,832
  • 1
  • 13
  • 34
  • Looks fine to me. I'd use it until you're confident it's a bottleneck. If you're already sure you need the fastest possible, maybe look into [SIMD](https://rust-lang-nursery.github.io/edition-guide/rust-2018/simd-for-faster-computing.html)? – anderspitman Jan 03 '19 at 19:46
  • 2
    You can do it in one line with iterators, like `a.into_iter().zip(b).map(|(a, b)| a*b).sum()`. But I'd expect that to be comparably fast, not significantly faster (or slower). – trent Jan 03 '19 at 20:41

3 Answers3

3

This isn't meant as a comprehensive general answer, but I wanted to share a little code.

Your implementation looks pretty much like what I would do unless I knew it was the bottleneck in my application. Then I'd look into more esoteric approaches (maybe SIMD).

That said, you might consider changing your function to take slice references instead. That way you can pass Vecs or arrays:

fn dot_product(a: &[f64], b: &[f64]) -> f64 {
    // Calculate the dot product of two vectors. 
    assert_eq!(a.len(), b.len()); 
    let mut product = 0.0;
    for i in 0..a.len() {
        product += a[i] * b[i];
    }
    product
}

fn main() {
    println!("{}", dot_product(&[1.0,2.0], &[3.0,4.0]));
    println!("{}", dot_product(&vec![1.0,2.0], &vec![3.0,4.0]));
}

See also:

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
anderspitman
  • 9,230
  • 10
  • 40
  • 61
0

I used rayon and packed_simd to calculate the dot product and found a way to be faster than Intel MKL:

extern crate packed_simd;
extern crate rayon;
extern crate time;

use packed_simd::f64x4;
use packed_simd::f64x8;
use rayon::prelude::*;
use std::vec::Vec;

fn main() {
    let n = 100000000;
    let x: Vec<f64> = vec![0.2; n];
    let y: Vec<f64> = vec![0.1; n];

    let res: f64 = x
        .par_chunks(8)
        .map(f64x8::from_slice_unaligned)
        .zip(y.par_chunks(8).map(f64x8::from_slice_unaligned))
        .map(|(a, b)| a * b)
        .sum::<f64x8>()
        .sum();
    println!("res: {}", res);
}

This code in my Github. I hope this helps.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • It's not needed to have `use std::vec::Vec;`; it's part of the prelude. Why is the time crate here? Can you compare this to the performance of the OPs code and the other answer? Why the choice of `unaligned`? – Shepmaster May 20 '19 at 16:29
  • I looked at this benchmark in more depth and I don't think it is representative or should be used. – Alex Moore-Niemi Jun 29 '22 at 16:15
0

Having looked at @Ching-Chuan Chen's response above in more depth in a fork of his code, I think the answer to this question should be: use ndarray and blas. It's 10x faster than the naive Rust implementation.

You can't see much here around dot because the heavy lifting is being done by including the blas feature for ndarray.

    let x = Array1::random(d, Uniform::<f32>::new(0., 1.));
    let y = Array1::random(d, Uniform::<f32>::new(0., 1.));

    for _i in 0..n {
        let _res: f32 = x.dot(&y);
    }

A couple things to note: 1. what's more representative than a dot product on one giant vector is many dot products on smaller vectors, since the sum should actually be counted per dot, and 2. it's going to be extremely hard to beat decades of linear algebra research packed into BLAS.

Alex Moore-Niemi
  • 2,913
  • 2
  • 24
  • 22