5

I'm developing a game with Swift and I have a static array of positional data that I use for processing in the game loop. I originally was using an array of Structs to hold this data but I decided to switch to classes so I could use referencing. However after making the change and profiling I noticed that the CPU was spending much more time on the method that handles this data than it did when I was using Structs.

So I decided to create a simple test to see what was going on.

final class SomeClass {}
struct SomeStruct {}

let classes = [
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
]

let structs = [
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
]

func test1() {
    for i in 0...10000000 {
        for s in classes {}
    }
}

func test2() {
    for i in 0...10000000 {
        for s in structs {}
    }
}

Test1 takes 15.4722579717636 s while Test2 takes only 0.276068031787872 s. Iterating continuously through the structs array was 56X faster. So my question is, why is this? I'm looking for a detailed answer. If I'd have to take a guess I would say that the structs themselves are stored sequentially in memory while the classes are only stored as addresses. So they would need to be dereferenced each time. But then again, don't the structs need to be copied each time?

Side Note: Both arrays are small, but I'm iterating them continuously. If I change the code to iterate once but make the arrays very large like so:

for i in 0...10000000 {
   structs.append(SomeStruct())
   classes.append(SomeClass())
}
func test1() {
    for s in classes {}
}

func test2() {
    for s in structs {}
}

Then I end up with the following: Test1 takes 0.841085016727448 s while Test2 takes 0.00960797071456909 s. The structs take 88X faster.

I'm using an OS X release build and the optimization level is set for Fastest,Smallest [-Os]


Edit

As requested, I have edited this question to include a test where the structs and classes are no longer empty. They use the same properties that I use in my game. Still did not make a difference. Structs are still so much faster, and I don't know why. Hopefully someone can provide the answer.

import Foundation

final class StructTest {
    let surfaceFrames = [
        SurfaceFrame(a: SurfacePoint(x: 0, y: 410), b: SurfacePoint(x: 0, y: 400), c: SurfacePoint(x: 875, y: 410), surfaceID: 0, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 880, y: 304), b: SurfacePoint(x: 880, y: 294), c: SurfacePoint(x: 962, y: 304), surfaceID: 1, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 787, y: 138), b: SurfacePoint(x: 791, y: 129), c: SurfacePoint(x: 1031, y: 248), surfaceID: 2, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 523, y: 138), b: SurfacePoint(x: 523, y: 128), c: SurfacePoint(x: 806, y: 144), surfaceID: 3, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1020, y: 243), b: SurfacePoint(x: 1020, y: 233), c: SurfacePoint(x: 1607, y: 241), surfaceID: 4, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1649, y: 304), b: SurfacePoint(x: 1649, y: 294), c: SurfacePoint(x: 1731, y: 305), surfaceID: 5, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1599, y: 240), b: SurfacePoint(x: 1595, y: 231), c: SurfacePoint(x: 1852, y: 128), surfaceID: 6, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1807, y: 141), b: SurfacePoint(x: 1807, y: 131), c: SurfacePoint(x: 2082, y: 138), surfaceID: 7, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 976, y: 413), b: SurfacePoint(x: 976, y: 403), c: SurfacePoint(x: 1643, y: 411), surfaceID: 8, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1732, y: 410), b: SurfacePoint(x: 1732, y: 400), c: SurfacePoint(x: 2557, y: 410), surfaceID: 9, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 2130, y: 490), b: SurfacePoint(x: 2138, y: 498), c: SurfacePoint(x: 2109, y: 512), surfaceID: 10, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1598, y: 828), b: SurfacePoint(x: 1597, y: 818), c: SurfacePoint(x: 1826, y: 823), surfaceID: 11, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 715, y: 826), b: SurfacePoint(x: 715, y: 816), c: SurfacePoint(x: 953, y: 826), surfaceID: 12, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 840, y: 943), b: SurfacePoint(x: 840, y: 933), c: SurfacePoint(x: 920, y: 943), surfaceID: 13, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1005, y: 1011), b: SurfacePoint(x: 1005, y: 1001), c: SurfacePoint(x: 1558, y: 1011), surfaceID: 14, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1639, y: 943), b: SurfacePoint(x: 1639, y: 933), c: SurfacePoint(x: 1722, y: 942), surfaceID: 15, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1589, y: 825), b: SurfacePoint(x: 1589, y: 815), c: SurfacePoint(x: 1829, y: 825), surfaceID: 16, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 0, y: 0), b: SurfacePoint(x: 1, y: 1), c: SurfacePoint(x: 2, y: 2), surfaceID: 17, dynamic:true)
    ]

    func run() {
        let startTime = CFAbsoluteTimeGetCurrent()
        for  i in 0 ... 10000000 {
            for s in surfaceFrames {
                
            }
        }
        let timeElapsed = CFAbsoluteTimeGetCurrent() - startTime
        println("Time elapsed \(timeElapsed) s")
    }
}



struct SurfacePoint {
    var x,y: Int
}
struct SurfaceFrame {
    let a,b,c :SurfacePoint
    let surfaceID: Int
    let dynamic: Bool
}

import Foundation

final class ClassTest {
    let surfaceFrames = [
        SurfaceFrame(a: SurfacePoint(x: 0, y: 410), b: SurfacePoint(x: 0, y: 400), c: SurfacePoint(x: 875, y: 410), surfaceID: 0, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 880, y: 304), b: SurfacePoint(x: 880, y: 294), c: SurfacePoint(x: 962, y: 304), surfaceID: 1, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 787, y: 138), b: SurfacePoint(x: 791, y: 129), c: SurfacePoint(x: 1031, y: 248), surfaceID: 2, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 523, y: 138), b: SurfacePoint(x: 523, y: 128), c: SurfacePoint(x: 806, y: 144), surfaceID: 3, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1020, y: 243), b: SurfacePoint(x: 1020, y: 233), c: SurfacePoint(x: 1607, y: 241), surfaceID: 4, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1649, y: 304), b: SurfacePoint(x: 1649, y: 294), c: SurfacePoint(x: 1731, y: 305), surfaceID: 5, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1599, y: 240), b: SurfacePoint(x: 1595, y: 231), c: SurfacePoint(x: 1852, y: 128), surfaceID: 6, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1807, y: 141), b: SurfacePoint(x: 1807, y: 131), c: SurfacePoint(x: 2082, y: 138), surfaceID: 7, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 976, y: 413), b: SurfacePoint(x: 976, y: 403), c: SurfacePoint(x: 1643, y: 411), surfaceID: 8, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1732, y: 410), b: SurfacePoint(x: 1732, y: 400), c: SurfacePoint(x: 2557, y: 410), surfaceID: 9, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 2130, y: 490), b: SurfacePoint(x: 2138, y: 498), c: SurfacePoint(x: 2109, y: 512), surfaceID: 10, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1598, y: 828), b: SurfacePoint(x: 1597, y: 818), c: SurfacePoint(x: 1826, y: 823), surfaceID: 11, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 715, y: 826), b: SurfacePoint(x: 715, y: 816), c: SurfacePoint(x: 953, y: 826), surfaceID: 12, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 840, y: 943), b: SurfacePoint(x: 840, y: 933), c: SurfacePoint(x: 920, y: 943), surfaceID: 13, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1005, y: 1011), b: SurfacePoint(x: 1005, y: 1001), c: SurfacePoint(x: 1558, y: 1011), surfaceID: 14, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1639, y: 943), b: SurfacePoint(x: 1639, y: 933), c: SurfacePoint(x: 1722, y: 942), surfaceID: 15, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 1589, y: 825), b: SurfacePoint(x: 1589, y: 815), c: SurfacePoint(x: 1829, y: 825), surfaceID: 16, dynamic:false),
        SurfaceFrame(a: SurfacePoint(x: 0, y: 0), b: SurfacePoint(x: 1, y: 1), c: SurfacePoint(x: 2, y: 2), surfaceID: 17, dynamic:true)
    ]

    func run() {
        let startTime = CFAbsoluteTimeGetCurrent()
        for  i in 0 ... 10000000 {
            for s in surfaceFrames {
                
            }
        }
        let timeElapsed = CFAbsoluteTimeGetCurrent() - startTime
        println("Time elapsed \(timeElapsed) s")
    }
}



struct SurfacePoint {
    var x,y: Int
}
final class SurfaceFrame {
    let a,b,c :SurfacePoint
    let surfaceID: Int
    let dynamic: Bool

    init(a: SurfacePoint, b: SurfacePoint, c: SurfacePoint, surfaceID: Int, dynamic: Bool) {
        self.a = a
        self.b = b
        self.c = c
        self.surfaceID = surfaceID
        self.dynamic = dynamic
    }
}

In this test, classes took 14.5261079668999 s while the test with structs took only 0.310304999351501 s. Structs were 47X faster.

Community
  • 1
  • 1
Epic Byte
  • 33,840
  • 12
  • 45
  • 93
  • 1
    BTW: There's no need to give the times to more than 2 or 3 significant digits. – Mike Dunlavey Jun 24 '15 at 17:33
  • 1
    Note that in debug code array handling in Swift is REALLY slow. You should select a release build to do any performance testing. – Duncan C Jun 24 '15 at 17:57
  • @DuncanC I am using a release build. – Epic Byte Jun 24 '15 at 18:03
  • 2
    @EpicByte: Have you tried to profile it with Instruments? My guess would be that each access to `classes` requires a retain+release. – Martin R Jun 24 '15 at 19:38
  • @MartinR Good suggestion, I'll edit the post to include my findings with instruments for the tests. However regarding the retain/release, I don't think this would be the case for the one iteration test with the large arrays that I posted under "Side Note" Because there is only 1 iteration. Correct? Unless access to the objects in classes requires a retain/release. Maybe that could be causing the drop in performance. Maybe I can try a weak reference? – Epic Byte Jun 24 '15 at 19:43
  • 1
    @EpicByte: No, what I assume is that each iteration step which assigns a value to `s` (which is a pointer to the object) retains the object, and releases it at the end of the iteration step. Just guessing ... – Martin R Jun 24 '15 at 19:46
  • @MartinR Ah yes I see what you mean. Good thought that might actually be it. – Epic Byte Jun 24 '15 at 19:47

3 Answers3

2

As Martin R recommended, I profiled both tests and indeed the retain/release calls is what makes iterating through an array of classes much slower than iterating through the array of structs. Just to be clear, here are the tests I ran.

import Foundation

final class SomeClass {}

struct SomeStruct {}

var classes = [
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
    SomeClass(),
]
var structs = [
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
    SomeStruct(),
]

let startTime = CFAbsoluteTimeGetCurrent()
/*
structTest()
classTest()
*/
let timeElapsed = CFAbsoluteTimeGetCurrent() - startTime
println("Time elapsed \(timeElapsed) s")


func structTest() {
    for i in 0 ... 1000000 {
        for e in structs {}
    }
}


func classTest() {
    for i in 0 ... 1000000 {
        for e in classes {}
    }
}

Here are pictures of the profiling of both tests using instruments. You can see just by adding up the running time that the Classes test spends nearly all of its time with retain/releases during each iteration. I'd be interested to see how Swift 2.0 handles this.

Structs enter image description here

Classes enter image description here

So just out of curiosity, I thought what would happen if I could by-pass the retain/release calls by doing pointer arithmetic directly on the array (side note: I recommend you never do this in a real app). So I created one last test. However in this test, instead of iterating the array multiple times I would just create one large array and iterate it once because this is where most of the overhead occurs anyway. I've also decided to access properties in this test to reduce the ambiguity in optimizations.

So here are the results from the final test:

  • One iteration over large Struct array: 1.00037097930908 s
  • One iteration over large Class array: 11.3165299892426 s
  • One iteration over large Struct array using pointer arithmetic: 0.773443996906281 s
  • One iteration over large Class array using pointer arithmetic: 2.81995397806168 s

Below is the code for the test.

final class SomeClass {
    var a: Int
    init(a: Int) {
        self.a = a
    }
}
struct SomeStruct {
    var a: Int
    init(a: Int) {
        self.a = a
    }
}

var classes: [SomeClass] = []
var structs: [SomeStruct] = []

var total: Int = 0

for i in 0 ... 100000000 {
    classes.append(SomeClass(a:i))
    structs.append(SomeStruct(a:i))
}

let startTime = CFAbsoluteTimeGetCurrent()
/*structTest()
classTest()
structTestPointer()
classTestPointer()*/
let timeElapsed = CFAbsoluteTimeGetCurrent() - startTime
println("Time elapsed \(timeElapsed) s")

func structTest() {
    for someStruct in structs {
        let a = someStruct.a
        total = total &+ a
    }
}

func structTestPointer() {
    var pointer = UnsafePointer<SomeStruct>(structs)
    for j in 0 ..< structs.count {
        let someStruct = pointer.memory
        let a = someStruct.a
        total = total &+ a
        pointer++
    }
}

func classTest() {
    for someClass in classes {
        let a = someClass.a
        total = total &+ a
    }
}

func classTestPointer() {
    var pointer = UnsafePointer<SomeClass>(classes)
    for j in 0 ..< classes.count {
        let someClass = pointer.memory
        let a = someClass.a
        total = total &+ a
        pointer++
    }
}
Epic Byte
  • 33,840
  • 12
  • 45
  • 93
0

This is highly compiler-dependent.

Your struct is empty, which makes this a not-meaningful comparison. You should create actual structs with unique properties and actual class objects with unique properties to make a valid test. (say a random Int in each one?)

My guess is that since structs are a value type, Swift actually creates a contiguous block of memory to hold the values and does pointer math to fetch the individual structs, but with class objects is has to do some indirection and message passing. Still the difference is huge.

Actually, since your structs are identical and immutable the Swift compiler might be collapsing all of them into a single object and ignoring your array indexing.

Duncan C
  • 128,072
  • 22
  • 173
  • 272
  • I actually originally had many value-type properties: points, bools and an integer all with different values. It's the same result. I can edit my answer to include this if you'd like. – Epic Byte Jun 24 '15 at 18:21
  • Also why would loading the classes from memory require message passing? – Epic Byte Jun 24 '15 at 18:30
  • Ok I updated my question so the structs are no longer empty. They now have the same values I use in my game. – Epic Byte Jun 24 '15 at 19:20
  • I don't know what the compiler is doing. Swift is still in flux, so there are no doubt things that still need to be optimized. The surest way to tell would be to look at the assembler code the compiler is generating. – Duncan C Jun 24 '15 at 19:27
  • Ok I will try to post the assembly. Do you recommend I post the assembly for the test with the empty structs/classes because there will be less instructions? – Epic Byte Jun 24 '15 at 19:28
0

I somewhere read that creation and copying of structs and enums occurs at the latest moment possible in order optimize for performance. If this is true then a struct would be created at the latest when it is going to be accessed.

Did you try to access the structs in

for  i in 0 ... 10000000 {
        for s in surfaceFrames {

e.g. by printing the content to the console? It would be interesting to know if this would have an impact on the measured performance.

Ivan Rigamonti
  • 696
  • 5
  • 7
  • I heard that somewhere too. Interesting thought, however If I try to access the struct (which I did in my game to perform some computation) the structs are still over 45-50X faster. – Epic Byte Jun 24 '15 at 20:57
  • Different sources state that structs are allocated on stack and classes on heap, e.g. http://stackoverflow.com/questions/24232799/why-choose-struct-over-class and https://news.ycombinator.com/item?id=8040848. If this is true then it would definitively make a difference. – Ivan Rigamonti Jun 24 '15 at 21:19