Why is performance of 64bit multiplication comparable to 32bit multiplication?

Question

I am measuring performance of rust when two 64 bits numbers are multiplied vs when two 32 bits numbers are multiplied. Recall that result for 64 multiplication is 128 number and result for 32 bits multiplication is 64 bit number. I expected 64 bits multiplication to at least 2x slower than the other. Mainly because there is no native 128 bits support and to multiply two 64 bits numbers you divide them into 32 bits hi and lows. However when I ran the test, it turns out both performs similar.

Here is the script I have used:

fn main() {
    test_64_mul();
    test_32_mul();
}

fn test_64_mul() {
    let test_num: u64 = 12345678653435363454;
    use std::time::Instant;
    let mut now = Instant::now();
    let mut elapsed = now.elapsed();
    for _ in 1..2000 {
        now = Instant::now();
        let _prod = test_num as u128 * test_num as u128;
        elapsed = elapsed + now.elapsed();
    }
    println!("Elapsed For 64: {:.2?}", elapsed);
}

fn test_32_mul() {
    let test_num: u32 = 1234565755;
    use std::time::Instant;
    let mut now = Instant::now();
    let mut elapsed = now.elapsed();
    for _ in 1..2000 {
        now = Instant::now();
        let _prod = test_num as u64 * test_num as u64;
        elapsed = elapsed + now.elapsed();
    }
    println!("Elapsed For 32: {:.2?}", elapsed);
}

Output of after running this code is

Elapsed For 64: 25.58µs

Elapsed For 32: 26.08µs

I am using MacBook Pro with M1 chip and rust version 1.60.0

This is going to be very difficult to answer without knowing the exact platform you are using and having a look at the assembly output. Also, what *is* your question? — Simon Doppler, May 25 '22 at 06:23
@Simon added information about platform and question is why 64 multiplication is comparable to 32 bit multiplication. Also I have made question more clear — CryptoKitty, May 25 '22 at 06:26

Chayim Friedman · Accepted Answer · 2022-05-25T06:44:50.293

7

Because the compiler has noticed you don't use the result, and eliminated the multiplication completely.

See the diff at https://rust.godbolt.org/z/5sjze7Mbv.

You should use something like std::hint::black_box(), or much better, a benchmarking framework like criterion.

Also, the overhead of creating a new Instant every time is likely much higher than of the multiplication itself. Like I said, use a benchmarking framework.

As noted by @StephenC, it is also unlikely that your clock resolution is small enough to measure one multiplication.

edited May 25 '22 at 06:44

answered May 25 '22 at 06:37

Chayim Friedman

47,971
5
48
77

3

Also, the benchmark is calling `Instant::now()` and `now.elapsed()` in the loop. So even if the compiler didn't eliminate the multiplication, the "elapsed time" is most likely including a lot of overheads. Then there is the problem that the clock resolution maybe too poor to accurately measure the time to perform a multiplication anyway. – Stephen C May 25 '22 at 06:42
1

@StephenC "Also, the overhead of creating a new `Instant` every time is likely much higher than of the multiplication itself. Like I said, use a benchmarking framework." – Chayim Friedman May 25 '22 at 06:43
Okay interesting I am newbie to rust I still wonder why elapsed time will include overheads of creating those instances, because my understanding is first instant is created and then elapsed time is calculated? unless `now.elapsed()` also created a new instance? – CryptoKitty May 25 '22 at 06:52
1

@muhammadharis For each run of the loop, you call `Instant::now()` and `Instant::elapsed()`, [that itself calls `Instant::now()`](https://doc.rust-lang.org/1.61.0/src/std/time.rs.html#378-380). [`Instant::now()` calls `mach_absolute_time()`](https://github.com/rust-lang/rust/blob/9fadabc879e0b16214e8216c1a63a597d1d5d36b/library/std/src/sys/unix/time.rs#L171), that probably involves a syscall. You have two syscalls per iteration, that is much, much costlier than some movs/adds/muls. – Chayim Friedman May 25 '22 at 06:57
Okay you are right I have accepted the answer thanks for your response I will try benchmarking framework – CryptoKitty May 25 '22 at 07:03

Why is performance of 64bit multiplication comparable to 32bit multiplication?

1 Answers1

Linked