3

I am measuring performance of rust when two 64 bits numbers are multiplied vs when two 32 bits numbers are multiplied. Recall that result for 64 multiplication is 128 number and result for 32 bits multiplication is 64 bit number. I expected 64 bits multiplication to at least 2x slower than the other. Mainly because there is no native 128 bits support and to multiply two 64 bits numbers you divide them into 32 bits hi and lows. However when I ran the test, it turns out both performs similar.

Here is the script I have used:

fn main() {
    test_64_mul();
    test_32_mul();
}

fn test_64_mul() {
    let test_num: u64 = 12345678653435363454;
    use std::time::Instant;
    let mut now = Instant::now();
    let mut elapsed = now.elapsed();
    for _ in 1..2000 {
        now = Instant::now();
        let _prod = test_num as u128 * test_num as u128;
        elapsed = elapsed + now.elapsed();
    }
    println!("Elapsed For 64: {:.2?}", elapsed);
}

fn test_32_mul() {
    let test_num: u32 = 1234565755;
    use std::time::Instant;
    let mut now = Instant::now();
    let mut elapsed = now.elapsed();
    for _ in 1..2000 {
        now = Instant::now();
        let _prod = test_num as u64 * test_num as u64;
        elapsed = elapsed + now.elapsed();
    }
    println!("Elapsed For 32: {:.2?}", elapsed);
}

Output of after running this code is

Elapsed For 64: 25.58µs

Elapsed For 32: 26.08µs

I am using MacBook Pro with M1 chip and rust version 1.60.0

Chayim Friedman
  • 47,971
  • 5
  • 48
  • 77
CryptoKitty
  • 654
  • 4
  • 20

1 Answers1

7

Because the compiler has noticed you don't use the result, and eliminated the multiplication completely.

See the diff at https://rust.godbolt.org/z/5sjze7Mbv.

You should use something like std::hint::black_box(), or much better, a benchmarking framework like criterion.

Also, the overhead of creating a new Instant every time is likely much higher than of the multiplication itself. Like I said, use a benchmarking framework.

As noted by @StephenC, it is also unlikely that your clock resolution is small enough to measure one multiplication.

Chayim Friedman
  • 47,971
  • 5
  • 48
  • 77
  • 3
    Also, the benchmark is calling `Instant::now()` and `now.elapsed()` in the loop. So even if the compiler didn't eliminate the multiplication, the "elapsed time" is most likely including a lot of overheads. Then there is the problem that the clock resolution maybe too poor to accurately measure the time to perform a multiplication anyway. – Stephen C May 25 '22 at 06:42
  • 1
    @StephenC "Also, the overhead of creating a new `Instant` every time is likely much higher than of the multiplication itself. Like I said, use a benchmarking framework." – Chayim Friedman May 25 '22 at 06:43
  • Okay interesting I am newbie to rust I still wonder why elapsed time will include overheads of creating those instances, because my understanding is first instant is created and then elapsed time is calculated? unless `now.elapsed()` also created a new instance? – CryptoKitty May 25 '22 at 06:52
  • 1
    @muhammadharis For each run of the loop, you call `Instant::now()` and `Instant::elapsed()`, [that itself calls `Instant::now()`](https://doc.rust-lang.org/1.61.0/src/std/time.rs.html#378-380). [`Instant::now()` calls `mach_absolute_time()`](https://github.com/rust-lang/rust/blob/9fadabc879e0b16214e8216c1a63a597d1d5d36b/library/std/src/sys/unix/time.rs#L171), that probably involves a syscall. You have two syscalls per iteration, that is much, much costlier than some movs/adds/muls. – Chayim Friedman May 25 '22 at 06:57
  • Okay you are right I have accepted the answer thanks for your response I will try benchmarking framework – CryptoKitty May 25 '22 at 07:03