8

I stumbled across the Benchmark Game (code page) and compared Fortran and C. I was very surprised about the difference in the calculation time on the Mandelbrot test (Fortran is 4.3 times slower!) because both languages have very similar feature sets. Moreover, Fortran should be able to optimize more radical (see e.g. "Is Fortran easier to optimize than C for heavy calculations?").

Can one explain which feature is missing in Fortran which would be needed to gain a speed such as in the C example? (It seems that the bit-operations here are boosting the code.)

EDIT: It is not a question on which programing language is better (there are always many aspects which play a role). It is rather a fundamental question on the difference of optimization in this example.


Add-on to the answer by Peter Cordes: There is a paper on Basics of Vectorization for Fortran Applications which also shortly discusses SIMD in Fortran programming. For Intel compilers: Explicit Vector Programming in Fortran

pawel_winzig
  • 902
  • 9
  • 28
  • 7
    Unless you will be calculating the Mandelbrot set in your work, in my opinion this benchmark is useless. If you want to see which one is better for your application, make a small test problem that mimics your research. For instance, I saw someone complain that Julia was 27x slower than python for printing out "hello World"... but who cares about that. When I tested them for a problem relevant to my research, Julia was 10,000 times faster than python. It was 100 times faster than python accelerated with Numba. Find a relevent benchmark is my recommendation. – Charlie Crown Jan 20 '19 at 05:38
  • 1
    @Charlie. Thanks! That really needed to be said. And let's be honest, even such large differences sometimes don't matter as much as comfort and familiarity. – Mad Physicist Jan 20 '19 at 05:44
  • 3
    @CharlieCrown: You are completely right. But it is a question out of curiosity. I was programing for several years in C++. Now, due to a collaboration I had to switch to FORTRAN. I find it so much more readable, especially when it comes to matrix operations! But I'm curious if I can always tweak my FORTRAN in such a way that it is as fast as C when it comes to heavy numerical calculations on the cluster. This Mandelbrot example simply surprised me. – pawel_winzig Jan 20 '19 at 06:04
  • The commonly-accepted wisdom is that FORTRAN is easier for compilers to optimize than C is. Because in C you have to use `foo(float *restrict outarray, float *restrict inarray){ ... }` to promise the compiler that input/output arrays don't overlap, allowing it to auto-vectorize. But if you write your C correctly using `restrict`, C compilers can optimize, too. If you found one specific benchmark where FORTRAN is slower, then post the details here instead of linking to them. What's the inner loop, and which compilers + options, and which hardware? What asm did the compilers make? – Peter Cordes Jan 20 '19 at 06:14
  • @PeterCordes: Too many info to post, see link. The benchmark is not so stupid. – pawel_winzig Jan 20 '19 at 06:19
  • If it's too big to fit in a self-contained SO question, then it's not really on-topic for SO. You need to at least summarize the important details; you can link to the off-site page for full source and stuff. – Peter Cordes Jan 20 '19 at 06:23
  • @PeterCordes: If I would new which part of the code (see link) is important I would not ask this question. – pawel_winzig Jan 20 '19 at 06:30
  • 2
    @pawel_winzig, the name of the language is Fortran. It has been Fortran since Fortran 90 was standardize 28+ years ago. – evets Jan 20 '19 at 06:50
  • @evets: I see, no capital letters... – pawel_winzig Jan 20 '19 at 06:54

1 Answers1

8

The winning C++ version on that benchmark site is manually vectorized for x86, using SIMD intrinsics (SSE, AVX, or AVX512), e.g. using _mm256_movemask_pd(v1 <= v2); to get a bitmask of a whole vector of compare results, letting it check 4 pixels in parallel for going out of bounds. And GNU C native vector syntax for SIMD multiply and whatever, like r2 + i2 to add or multiply SIMD vectors with normal C / C++ operators.

The C++ version has a loop condition that's optimized for SIMD:

 // Do 50 iterations of mandelbrot calculation for a vector of eight
 // complex values.  Check occasionally to see if the iterated results
 // have wandered beyond the point of no return (> 4.0).

The Fortran is merely using OpenMP for auto-parallelization, and auto-vectorization by the compiler isn't going to create anything nearly as good as a hand-tuned loop condition that keeps doing redundant work the source didn't (because that's cheaper than checking more frequently).


There are lots of C and C++ versions of the program that are a similar speed to the Fortran version. They're pretty even for C/C++ source that isn't manually vectorized.

I'm not sure if Intel Fortran or any other compiler supports extensions for manual vectorization.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • I see! The last point you mentioned should have been my question... Thank you very much for your effort. – pawel_winzig Jan 20 '19 at 06:45
  • 5
    So basically assembly disguised by C syntax. – Vladimir F Героям слава Jan 20 '19 at 07:22
  • @VladimirF: Do you know if this is also possible in Fortran? – pawel_winzig Jan 20 '19 at 08:50
  • @VladimirF: arguably, but compilers are free to optimize intrinsics. You could just as well say that `a+b` is `addsd xmm0, xmm1` in disguise. C was designed as a portable assembly language, where every expression / operator corresponded to something the hardware could do in one or two instructions. If you look at intrinsics as new operators for new operations that modern CPUs provide, they're not that special. Compilers can still unroll them, optimize them away, or optimize `_mm_add_pd(_mm_mul_pd(), x)` into an FMA. (gcc and clang do this, but MSVC and ICC treat intrinsics more like asm.) – Peter Cordes Jan 20 '19 at 10:09
  • 1
    @pawel_winzig There is an undocumented way to call intrinsics tied to certain x86_64 instructions in Intel Fortran (if interested, search `DEC$ ATTRIBUTES KNOWN_INTRINSIC`), but not those that use vector registers. There is no datatype for these available in Fortran. – Vladimir F Героям слава Jan 21 '19 at 12:41