59

Like many other developers I have been very excited about the new Swift language from Apple. Apple has claimed its speed is faster than Objective C and can be used to write operating system. And from what I learned so far, it's a static typed language and able to have precisely control over the exact data type (like integer length). So it does look like having good potential handling performance critical tasks, like image processing, right?

That's what I thought before I carried out a quick test. The result really surprised me.

Here is a simple code snippet in C:

test.c:

#include <stdio.h>
#include <stdint.h>
#include <string.h>

uint8_t pixels[640*480];
uint8_t alpha[640*480];
uint8_t blended[640*480];

void blend(uint8_t* px, uint8_t* al, uint8_t* result, int size)
{
    for(int i=0; i<size; i++) {
        result[i] = (uint8_t)(((uint16_t)px[i]) *al[i] /255);
    }
}

int main(void)
{
    memset(pixels, 128, 640*480);
    memset(alpha, 128, 640*480);
    memset(blended, 255, 640*480);

    // Test 10 frames
    for(int i=0; i<10; i++) {
        blend(pixels, alpha, blended, 640*480);
    }

    return 0;
}

I compiled it on my Macbook Air 2011 with the following command:

clang -O3 test.c -o test

The 10 frame processing time is about 0.01s. In other words, it takes the C code 1ms to process one frame:

$ time ./test
real    0m0.010s
user    0m0.006s
sys     0m0.003s

Then I have a Swift version of the same code:

test.swift:

let pixels = UInt8[](count: 640*480, repeatedValue: 128)
let alpha = UInt8[](count: 640*480, repeatedValue: 128)
let blended = UInt8[](count: 640*480, repeatedValue: 255)

func blend(px: UInt8[], al: UInt8[], result: UInt8[], size: Int)
{
    for(var i=0; i<size; i++) {
        var b = (UInt16)(px[i]) * (UInt16)(al[i])
        result[i] = (UInt8)(b/255)
    }
}

for i in 0..10 {
    blend(pixels, alpha, blended, 640*480)
}

The build command line is:

xcrun swift -O3 test.swift -o test

Here I use the same O3 level optimization flag to make the comparison hopefully fair. However, the resulting speed is 100 time slower:

$ time ./test

real    0m1.172s
user    0m1.146s
sys     0m0.006s

In other words, it takes Swift ~120ms to processing one frame which takes C just 1 ms.

What happened?

Update: I am using clang:

$ gcc -v
Configured with: --prefix=/Applications/Xcode6-Beta.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.34.4) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin13.2.0
Thread model: posix

Update: more results with different running iterations:

Here are the result for different number of "frames", i.e. change the main for loop number from 10 to other numbers. Note now I am getting even faster C code time (cache hot?), while the Swift time doesn't change too much:

             C Time (s)      Swift Time (s)
  1 frame:     0.005            0.130
 10 frames(*): 0.006            1.196
 20 frames:    0.008            2.397
100 frames:    0.024           11.668

Update: `-Ofast` helps

With -Ofast suggested by @mweathers, the Swift speed goes up to reasonable range.

On my laptop the Swift version with -Ofast gets 0.013s for 10 frames and 0.048s for 100 frames, close to half of the C performance.

Penghe Geng
  • 13,286
  • 5
  • 31
  • 40
  • 1
    Out of curiosity, does it help to replace the blend computation with `var b = (UInt16)(px[i]) &* (UInt16)(al[i])`, which if I read the docs correctly will cause swift to avoid the overflow check? – rici Jun 08 '14 at 03:36
  • @rici No, it doesn't help. – Penghe Geng Jun 08 '14 at 03:43
  • @Potatoswatter Yes it's clang. Updated in the end of question. – Penghe Geng Jun 08 '14 at 03:48
  • 1
    What happens if you adjust the code to do the same process twice (ie, extend iterations from 10 to 20)? I would imagine that starting the Swift runtime costs somewhat more than starting the C runtime. – Tommy Jun 08 '14 at 03:52
  • 2
    can you dump the assembly code? my guess is the clang version may be optimizing the divide-by-constant 255 – gordy Jun 08 '14 at 03:53
  • 6
    Try profiling just the `blend` function. Filling the array and differences in environment setup are probably playing a role. – Bill Jun 08 '14 at 03:58
  • Yes, invoking time on a short loop tells you nothing ... you certainly can't divide that by 10 and make a claim about the per-frame computation time, because that's not what you're measuring. – Jim Balter Jun 08 '14 at 04:03
  • @Tommy Updated the results for other iterations other than 10 in the end of the question. – Penghe Geng Jun 08 '14 at 04:10
  • 1
    Swift does seem absurdly slow in some basic benchmarks at the moment. e.g. http://stackoverflow.com/questions/24101718/swift-performance-sorting-arrays – Joseph Mark Jun 08 '14 at 04:20
  • 4
    Reading the whole link, one can see "Changing the Swift Compiler - Optimization Level in Xcode to 'Fastest, Unchecked' sped this up to be comparable with your C++." – Jim Balter Jun 08 '14 at 04:21
  • The problem is that you're calling the `time` command on the entire program. Swift may have more runtime libraries and configuration to set up than plain C and the `Array(count:, repeatedValue:)` function is probably not highly tuned. You should do your profiling in-program, e.g. in each run through your 10-iteration for loop, save a timestamp, then call blend, then print out the elapsed time after blend is done. This is the only way to compare apples to apples. – Bill Jun 08 '14 at 12:10
  • @Bill: No, this is not the explanation. I did my benchmarks [here](http://stackoverflow.com/questions/24101718/swift-performance-sorting-arrays) both in-program and for the entire program, and the results are consistent. -Ofast is roughly as fast as C code, -O3 is 50-100 time slower. This difference does not go away, no matter how you measure it. – Jukka Suomela Jun 08 '14 at 12:23
  • And this is why image processing/analysis is best done in "unsafe" languages. – Ed S. Mar 03 '15 at 02:30

2 Answers2

25

Building with:

xcrun swift -Ofast test.swift -o test

I'm getting times of:

real    0m0.052s
user    0m0.009s
sys 0m0.005s
mweathers
  • 941
  • 8
  • 11
  • 36
    @JeremyBanks: -Ofast changes the semantics of the language. It is not safe. You are turning Swift into a C++-like language. Integer overflows, array overflows, etc. are silently ignored. – Jukka Suomela Jun 08 '14 at 08:43
  • 5
    More examples of the impact of -Ofast here: http://stackoverflow.com/questions/24101718/swift-performance-sorting-arrays – Jukka Suomela Jun 08 '14 at 09:17
  • -1: This should be a comment. Though, it does suggest that the problem is array bound checking. Skimming the guide, I don't see an unsafe array access operator. – Potatoswatter Jun 08 '14 at 10:06
  • 1
    @Potatoswatter That's a valid answer. Swift is slower because it does array bound checking, and if you remove those checks, you get C-like behavior, and C-like speed. – toasted_flakes Jun 08 '14 at 11:08
  • 7
    @grasGendarme: Array bound checking does *not* imply a factor 100 slowdown (it should be more like a factor << 2). Cf. Java vs. C. – Jukka Suomela Jun 08 '14 at 11:33
  • 6
    @JukkaSuomela Array bound checking most certainly can account for 100x slowdown in an inner loop like this with over a million iterations – gordy Jun 08 '14 at 17:50
  • 5
    @gordy: See [this](http://stackoverflow.com/questions/24101718/swift-performance-sorting-arrays) for some Swift vs. Java vs. Python vs. C++ comparisons. And keep in mind that Java does bounds checking, too (and more — it also has to check for a null pointer). – Jukka Suomela Jun 08 '14 at 19:07
  • Seems like I'm [kinda right](https://stackoverflow.com/questions/24101718/swift-performance-sorting-arrays) after all :p Swift is either 2x slower than python, or unsafe. – toasted_flakes Jun 08 '14 at 21:33
  • 2
    I'm gonna be "that guy": why would you even post only one side of the benchmark?! – dequis Jun 09 '14 at 05:26
  • Because the OP posted the other benchmarks. I was only demonstrating that compiling with -Ofast increased performance. – mweathers Jun 09 '14 at 14:20
  • 1
    I had hoped the answer to this question would offer some detailed explanations of what happened to the slow swift compiled code, like how those mysterious retain and release calls are doing. But I haven't seen those details either from here or the other related question. Overall I think @mweathers gives the best contribution to the discussion by discovering the '-Ofast' option. Now that this question has been closed and thus no new answers will come, I think mweathers deserves some additional credit and I will accept this as the answer. – Penghe Geng Jun 11 '14 at 02:58
  • Switching C to C++ and changing the method signature to inline void blend(const uint8_t* const px, const uint8_t* const al, uint8_t* result, const int size) changed the performance of the C version a bit for me. – Tomas Andrle Jul 08 '14 at 14:22
  • @gordy That depends a lot on how those bounds are checked. The compiler could decide to do only one check, before iterating the array - assuming it knew that `size` is the size of both of the arrays. In C#, the compiler understands something like `for (var i = 0; i < array.Length; i++)` and moves the bounds checking before the cycle. Sadly, it doesn't understand `for (var i = 0; i < ar1.Length && i < ar2.Length; i++)`... I'd expect the Swift compiler will also have something like this it can use when you iterate the array more idiomatically (rather than just converting C->Swift line-by-line). – Luaan Jul 27 '15 at 11:42
11

Let's just concentrate on the answer to the question, which started with a "Why": Because you didn't turn optimisations on, and Swift relies heavily on compiler optimisation.

That said, doing image processing in C is truly daft. That's what you have CGImage and friends for.

gnasher729
  • 51,477
  • 5
  • 75
  • 98