Surprising Performance Results

Question

I am surprised by the results of this small performance test (Project Euler #1):

c: 233168 in 134 ms
java: 233168 in 46 ms
py: 233168 in 12 ms

I have attached the full code examples below as well as the execution shell script.

Does anyone have any idea as to why the c solution is so much slower than both the Java and Python solutions?

C

#include <stdio.h>

int main() {
  int i, sum = 0;

  for (i = 1; i < 1000; i++) {
    if (i % 3 == 0 || i % 5 == 0) {
      sum += i;
    }
  }

  printf("%d\n", sum);
}

Java

class Main {
  public static void main(String[] args) {
    int sum = 0;

    for (int i = 1; i < 1000; i++) {
      if (i % 3 == 0 || i % 5 == 0) {
        sum += i;
      }
    }

    System.out.println(sum);
  }
}

Python

total = 0

for i in range(1, 1000):
    if i % 3 == 0 or i % 5 == 0:
        total += i


print(total)

Run.sh - (slightly redacted for brevity)

milliseconds() {
        gdate +%s%3N
}

exec_c() {
        $build_cache/a.out
}

exec_py() {
        python $build_cache/main.py
}

exec_java() {
        java -classpath $build_cache Main
}

time_func() {
        start_time=$(milliseconds)
        output=$($1)
        end_time=$(milliseconds)
        delta_time=$((end_time - start_time))
        echo "$output in $delta_time ms"
}

run() {
        filename=$1
        extension=${filename##*.}

        case "$extension" in
        "c")
                cc $filename -o $build_cache/a.out
                if [ $? -ne 0 ]; then
                        exit 1
                fi

                time_func exec_c
                ;;
        "py")
                cp $filename $build_cache/main.py
                time_func exec_py
                ;;
        "java")

                javac -d $build_cache $filename 2>&1
                if [ $? -ne 0 ]; then
                        exit 1
                fi

                time_func exec_java
                ;;
        *)
                echo "unsupported filetype"
                exit 1
                ;;
        esac
}

Those tests are too small to make any sense, especially if you run them only once. — Federico klez Culloca, Jun 10 '22 at 18:04
I agree with you. However, the speeds are almost identical every time I run it. I wasn't doing this for any real benchmarking, but I was just very surprised by the results. What could cause this? — Joshua Taylor Eppinette, Jun 10 '22 at 18:07
When I bump the limit to 1,000,000 the numbers are more reasonable: c: 54 ms, java: 48 ms, py: 95 ms. However, it is still surprising. — Joshua Taylor Eppinette, Jun 10 '22 at 18:10
You are compiling the C code without optimizations, try adding `-O2` flag to your `cc` command — UnholySheep, Jun 10 '22 at 18:10
There are any number of reasons why this timing would be inaccurate. I/O is notoriously slow, and variable; each program - by nature of its different standard library etc. - uses the standard output differently. Getting the current time by *invoking another program* (`gdate`) also makes the timing results effectively useless. Not to mention the setup/teardown overhead of the programs themselves. On the flip side, there are probably sufficiently aggressive C compiler options to make the entire computation happen at compile-time. — Karl Knechtel, Jun 10 '22 at 18:12
You can't really reliably test like that. Too many things are going on in the OS. And you also need to run them in random order and multiple times. Also different loaders and or interpreters have different startup overhead. — WJS, Jun 10 '22 at 18:13
Without printing or the timing overhead, just the calculation loop should take more like 0.1 milliseconds in Python (per my own results: wrapping the code in a function, not `print`ing anything, and using the `timeit` standard library module to do timing properly. My computer is pretty old, too.) — Karl Knechtel, Jun 10 '22 at 18:14
Take a look at [What is microbenchmarking](https://stackoverflow.com/questions/2842695/what-is-microbenchmarking). It has a lot of information about everything you're reading in the comments above :) — Federico klez Culloca, Jun 10 '22 at 18:17
Hey, thanks everyone, I proposed an answer to my question below. I used hyperfine to run the test it rendered results that more lined up with my world view lol. However, I still interested in why the previous test harness favored the Python and Java solutions over the C solution. @FedericoklezCulloca I will checkout the link! — Joshua Taylor Eppinette, Jun 10 '22 at 18:25

score 2 · Answer 1 · answered Jun 10 '22 at 18:24

Upon taking everyone's advice and moving the test to a true benchmarking tool, the surprising result (C being slowest) has disappeared.

Therefore, the failure must have been in the execution harness. However, it is interesting that the previous harness favored C and Java more than the Python solution...

$ hyperfine "./.build/a.out" "./.build/main.py" "java -classpath ./.build Main"

Benchmark 1: ./.build/a.out
  Time (mean ± σ):       0.2 ms ±   0.1 ms    [User: 0.0 ms, System: 0.1 ms]
  Range (min … max):     0.0 ms …   1.9 ms    746 runs

Benchmark 2: ./.build/main.py
  Time (mean ± σ):      24.6 ms ±  56.9 ms    [User: 6.9 ms, System: 2.3 ms]
  Range (min … max):     8.3 ms … 214.1 ms    13 runs


Benchmark 3: java -classpath ./.build Main
  Time (mean ± σ):      41.2 ms ±   0.6 ms    [User: 39.2 ms, System: 8.2 ms]
  Range (min … max):    40.3 ms …  44.6 ms    64 runs

Summary
  './.build/a.out' ran
  143.25 ± 350.93 times faster than './.build/main.py'
  239.24 ± 195.07 times faster than 'java -classpath ./.build Main'

Surprising Performance Results

1 Answers1