36

I'm doing a raytracer hobby project, and originally I was using structs for my Vector and Ray objects, and I thought a raytracer was the perfect situation to use them: you create millions of them, they don't live longer than a single method, they're lightweight. However, by simply changing 'struct' to 'class' on Vector and Ray, I got a very significant performance gain.

What gives? They're both small (3 floats for Vector, 2 Vectors for a Ray), don't get copied around excessively. I do pass them to methods when needed of course, but that's inevitable. So what are the common pitfalls that kill performance when using structs? I've read this MSDN article that says the following:

When you run this example, you'll see that the struct loop is orders of magnitude faster. However, it is important to beware of using ValueTypes when you treat them like objects. This adds extra boxing and unboxing overhead to your program, and can end up costing you more than it would if you had stuck with objects! To see this in action, modify the code above to use an array of foos and bars. You'll find that the performance is more or less equal.

It's however quite old (2001) and the whole "putting them in an array causes boxing/unboxing" struck me as odd. Is that true? However, I did pre-calculate the primary rays and put them in an array, so I took up on this article and calculated the primary ray when I needed it and never added them to an array, but it didn't change anything: with classes, it was still 1.5x faster.

I am running .NET 3.5 SP1 which I believe fixed an issue where struct methods weren't ever in-lined, so that can't be it either.

So basically: any tips, things to consider and what to avoid?

EDIT: As suggested in some answers, I've set up a test project where I've tried passing structs as ref. The methods for adding two Vectors:

public static VectorStruct Add(VectorStruct v1, VectorStruct v2)
{
  return new VectorStruct(v1.X + v2.X, v1.Y + v2.Y, v1.Z + v2.Z);
}

public static VectorStruct Add(ref VectorStruct v1, ref VectorStruct v2)
{
  return new VectorStruct(v1.X + v2.X, v1.Y + v2.Y, v1.Z + v2.Z);
}

public static void Add(ref VectorStruct v1, ref VectorStruct v2, out VectorStruct v3)
{
  v3 = new VectorStruct(v1.X + v2.X, v1.Y + v2.Y, v1.Z + v2.Z);
}

For each I got a variation of the following benchmark method:

VectorStruct StructTest()
{
  Stopwatch sw = new Stopwatch();
  sw.Start();
  var v2 = new VectorStruct(0, 0, 0);
  for (int i = 0; i < 100000000; i++)
  {
    var v0 = new VectorStruct(i, i, i);
    var v1 = new VectorStruct(i, i, i);
    v2 = VectorStruct.Add(ref v0, ref v1);
  }
  sw.Stop();
  Console.WriteLine(sw.Elapsed.ToString());
  return v2; // To make sure v2 doesn't get optimized away because it's unused. 
}

All seem to perform pretty much identical. Is it possible that they get optimized by the JIT to whatever is the optimal way to pass this struct?

EDIT2: I must note by the way that using structs in my test project is about 50% faster than using a class. Why this is different for my raytracer I don't know.

oguz ismail
  • 1
  • 16
  • 47
  • 69
JulianR
  • 16,213
  • 5
  • 55
  • 85
  • Good luck with the project, a ray tracer is something I will be tackling soon. – GurdeepS Feb 28 '09 at 01:42
  • 2
    Creating a raytracer is a lot of fun. I find it fascinating you can create an image from nothing more than a bunch of floats and relatively simple vector math. – JulianR Feb 28 '09 at 14:12
  • 1
    I don't think the article said that putting structs in an array causes boxing. It cautioned that using them in places where objects are expected does cause boxing; for example if you pass them to a method expecting an argument of type object. – ILoveFortran Feb 28 '09 at 16:10
  • possible duplicate of [When to use struct in C#?](http://stackoverflow.com/questions/521298/when-to-use-struct-in-c) – nawfal May 27 '13 at 05:48
  • See also http://stackoverflow.com/questions/521298/when-to-use-struct-in-c/521343 (especially my answer there :) ) – Brian Feb 28 '09 at 05:49

12 Answers12

30

An array of structs would be a single contiguous structure in memory, while items in an array of objects (instances of reference types) need to be individually addressed by a pointer (i.e. a reference to an object on the garbage-collected heap). Therefore if you address large collections of items at once, structs will give you a performance gain since they need fewer indirections. In addition, structs cannot be inherited, which might allow the compiler to make additional optimizations (but that is just a possibility and depends on the compiler).

However, structs have quite different assignment semantics and also cannot be inherited. Therefore I would usually avoid structs except for the given performance reasons when needed.


struct

An array of values v encoded by a struct (value type) looks like this in memory:

vvvv

class

An array of values v encoded by a class (reference type) look like this:

pppp

..v..v...v.v..

where p are the this pointers, or references, which point to the actual values v on the heap. The dots indicate other objects that may be interspersed on the heap. In the case of reference types you need to reference v via the corresponding p, in the case of value types you can get the value directly via its offset in the array.

ILoveFortran
  • 3,441
  • 1
  • 21
  • 19
  • Also, besides superfluous indirection, linear traversal of vvvv is cache-friendly, while linear traversal of ..v..v...v.v.. is not. Modern hardware does memory reqeuests by 64 byte chunks. Even if you want to load 8 bytes from some specific memory location 64 bytes would be transfered and cached at CPU - i.e. it would be only 8/64=0.125 useful load. – Evgeny Panasyuk Jun 19 '13 at 18:42
  • Do you have a source for arrays of structs working as you describe? (I know that they're value types, but perhaps the array stores references to those value types?) – ispiro Jun 28 '18 at 17:29
12

In the recommendations for when to use a struct it says that it should not be larger than 16 bytes. Your Vector is 12 bytes, which is close to the limit. The Ray has two Vectors, putting it at 24 bytes, which is clearly over the recommended limit.

When a struct gets larger than 16 bytes it can no longer be copied efficiently with a single set of instructions, instead a loop is used. So, by passing this "magic" limit, you are actually doing a lot more work when you pass a struct than when you pass a reference to an object. This is why the code is faster with classes eventhough there is more overhead when allocating the objects.

The Vector could still be a struct, but the Ray is simply too large to work well as a struct.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • I see, didn't know the 16 bytes limit was such a hard limit, thought of it more as a guideline. However, the funny thing is that having my Ray as struct doesn't slow the application down much if at all, while making my Vector a struct does slow it down by 50%. – JulianR Feb 28 '09 at 12:16
  • 2
    Having Vector as a class and Ray as a struct will make Ray contain two references. That will work sizewise, but you may get some surprising semantic effects. Making both structs is what puts the Ray struct over the size limit. – Guffa Feb 28 '09 at 14:29
  • 1
    Struct handling is optimized for the case where structs are 16 bytes or less; the performance of a 17-byte struct will thus be markedly worse than that of a 16-byte struct. On the other hand, if one avoids passing structs by value (pass by `ref` instead whenever possible) even a 100-byte struct can perform better than a 100-byte class. – supercat Mar 10 '12 at 20:39
  • @supercat: Well, passing a struct by reference would perform pretty much the same as passing a class by value. If you actually *use* the data in the scruct/class, the class will actually perform slightly better because of the extra indirection that passing by reference adds. Besides, if you have to pass the structure by reference everywhere, you are forced to write pretty ugly code. – Guffa Mar 11 '12 at 13:32
  • @Guffa: The indirection exists whether one is using a class or a struct. If one has a separate class instance for every element, the performance will be almost identical except for the time required to create the class objects, and the extra 12-24 bytes required for the class reference and object overhead. The main difference is that if I say `someProc(ref myStruct);`, the procedure will be able to mutate `myStruct` *only until it returns*. By contrast, if a mutable class object is ever exposed to external code, there's no way of knowing when that code might cause it to change. – supercat Mar 11 '12 at 23:40
9

Anything written regarding boxing/unboxing prior to .NET generics can be taken with something of a grain of salt. Generic collection types have removed the need for boxing and unboxing of value types, which makes using structs in these situations more valuable.

As for your specific slowdown - we'd probably need to see some code.

Erik Forbes
  • 35,357
  • 27
  • 98
  • 122
  • Figured something like that, but haven't arrays of a certain type always been "generic"? Or was in int[] an object[] internally in .NET 1.0? As for source code: I can hardly post the entire source here, but I'll see if I can dig up something relevant. – JulianR Feb 28 '09 at 01:17
  • Yes, arrays were always (sort of) generic. – configurator Feb 28 '09 at 16:43
7

Basically, don't make them too big, and pass them around by ref when you can. I discovered this the exact same way... By changing my Vector and Ray classes to structs.

With more memory being passed around, it's bound to cause cache thrashing.

TraumaPony
  • 10,742
  • 12
  • 54
  • 74
  • I thought value types were boxed when you pass them by ref? – Erik Forbes Feb 28 '09 at 01:13
  • No, not at all. It just passes a pointer, I believe. For small methods, e.g. addition and simple ray-interesection tests, the cost of pointer dereferencing is less than the cost of copying the memory. – TraumaPony Feb 28 '09 at 01:17
  • Interesting, passing by ref would kill all my pretty operator overloading and elegant code. I'll try to benchmark this. – JulianR Feb 28 '09 at 01:19
  • 2
    Passing by reference? The point of having a struct is that it's so small that it's more efficient to copy it than to have the overhead of a reference to it... – Guffa Feb 28 '09 at 04:12
  • From my experience passing them as ref was slowing things down. – Joan Venge Feb 28 '09 at 06:02
  • 3
    Necromancing this as I've grown a little more experienced since I asked this. In a rewrite of the raytracer, passing *all* structs by reference was indeed a lot faster. But yeah, it really did make my code very ugly. Small mathematical one-lined statements turned into 5 lined blocks of Vector3.Add/Subtract/Multiply/etc and I had to resort to a lot of public fields because a property can't be passed by ref (makes sense, it's a method after all). It's really fast now though, real-time framerates in very simple scenes. – JulianR Jun 29 '09 at 00:50
  • @JulianR: In many cases, the "proper" way to move information around is to pass structs by ref. If one can avoid passing or copying structs by value in any case where one doesn't need to have an independent copy, large structs can perform just as well as small ones. The only difficulty with that approach is that in some contexts the system will insist upon making defensive copies of structs (e.g. when invoking any member method upon a struct in a `readonly` storage location). Classes can have advantages in some contexts, but when those advantages aren't needed, structs passed by reference win. – supercat Mar 10 '12 at 20:37
  • 1
    @ErikForbes: Structs are boxed when they are stored as stand-alone heap objects which must continue exist outside the context of the code or object which is using them. Every stand-alone heap object needs to have type information stored with it, since there may be nothing else that identifies its type. By contrast, if one has a e.g. variable of type `Rectangle`, the code which is using that variable will know that it's a `Rectangle`; likewise if a class has a field of that type, code which uses that field will know what type it is. Passing a struct like `List.Enumerator` to code... – supercat Aug 09 '12 at 15:40
  • 1
    ...which expects an `IEnumerator` will box it, because the code code which is using the enumerator would not otherwise have any way of knowing that the thing it was given was a `List.Enumerator`. If instead one were to write a method like `void UseEnumerator(ref T theEnumerator) where T:IEnumerator` and called it with a `List.Enumerator`, the system would generate a special version of `UseEnumerator.Enumerator()`, which would know that its parameter was of that exact type; since the code would know the exact type, no boxing would be required. – supercat Aug 09 '12 at 15:44
7

I think the key lies in these two statements from your post:

you create millions of them

and

I do pass them to methods when needed of course

Now unless your struct is less than or equal to 4 bytes in size (or 8 bytes if you are on a 64-bit system) you are copying much more on each method call then if you simply passed an object reference.

Andrew Hare
  • 344,730
  • 71
  • 640
  • 635
  • 1
    It's still quicker than the massive amount of garbage collection that would occur... – TraumaPony Feb 28 '09 at 01:18
  • 1
    Does it actually do any GC though? If I render a really large image, the memory usage of my process just keeps climbing indefinitely, perhaps because even at using 3 GB, I still have plenty of memory left and perhaps in that case the GC rather waits until I'm done hogging the CPU. – JulianR Feb 28 '09 at 01:29
  • Well, it should be GCing. At least, that was the problem with my raytracer. – TraumaPony Feb 28 '09 at 01:33
  • I agree with TraumaPony. I had the same problem with a 3d renderer I wrote. When things are used and destroyed in an instant, structs really make things way faster. Also if you check out XNA for instance almost everything I have used was a struct. – Joan Venge Feb 28 '09 at 06:00
  • That is only true if you are not passing those structs to other methods. – Andrew Hare Feb 28 '09 at 12:43
  • Try passing these structs by ref and see what happens. – NM. Feb 28 '09 at 17:23
  • Profile the app with a .NET Memory Profiler – Andrei Rînea Mar 03 '09 at 23:43
  • 2
    If one is creating millions of distinct instances, a struct will offer better performance than an immutable class. Nearly always. The only time classes can win is if most references point to instances which can be shared with other references. Having a million references that all refer to one of three immutable class instances is likely better than having a million structs which all 'happen' to hold one of three combinations of fields, but if the million references would all point to difference class instances, nothing is gained by using a class rather than a struct. – supercat Mar 10 '12 at 20:44
6

The first thing I would look for is to make sure that you have explicitly implemented Equals and GetHashCode. Failing to do this means that the runtime implementation of each of these does some very expensive operations to compare two struct instances (internally it uses reflection to determine each of the private fields and then checkes them for equality, this causes a significant amount of allocation).

Generally though, the best thing you can do is to run your code under a profiler and see where the slow parts are. It can be an eye-opening experience.

  • I tried the overrides, even though I didn't use any Vector or Ray comparisons, but it had no effect. Good tip though, I will make sure to override Equals and GetHashCode from now on :) – JulianR Feb 28 '09 at 12:25
4

Have you profiled the application? Profiling is the only sure fire way to see where the actual performance problem is. There are operations that are generally better/worse on structs but unless you profile you'd just be guessing as to what the problem is.

JaredPar
  • 733,204
  • 149
  • 1,241
  • 1,454
2

While the functionality is similar, structures are usually more efficient than classes. You should define a structure, rather than a class, if the type will perform better as a value type than a reference type.

Specifically, structure types should meet all of these criteria:

  • Logically represents a single value
  • Has an instance size less than 16 bytes
  • Will not be changed after creation
  • Will not be cast to a reference type
Gineer
  • 2,358
  • 4
  • 26
  • 40
  • 3
    I disagree with Eric Lippert's hatred of mutable structs. To be sure, certain limitations in the design of .net make mutable structs less friendly than they otherwise would be, but arrays of mutable structs are often the right way to store things. On a 64-bit machine, an array of a million 8-byte structs will take 8 megabytes of RAM; an array of a million instances of a class with 8 bytes' worth of fields will take 40 megabytes. Even if a struct held 40 bytes of data (way over the 16-byte recommended threshold) it would still cut memory usage by 50%. – supercat Jan 20 '11 at 20:51
0

I use structs basically for parameter objects, returning multiple pieces of information from a function, and... nothing else. Don't know whether it's "right" or "wrong," but that's what I do.

Instance Hunter
  • 7,837
  • 5
  • 44
  • 56
0

My own ray tracer also uses struct Vectors (though not Rays) and changing Vector to class does not appear to have any impact on the performance. I'm currently using three doubles for the vector so it might be bigger than it ought to be. One thing to note though, and this might be obvious but it wasn't for me, and that is to run the program outside of visual studio. Even if you set it to optimized release build you can get a massive speed boost if you start the exe outside of VS. Any benchmarking you do should take this into consideration.

Morten Christiansen
  • 19,002
  • 22
  • 69
  • 94
  • Out of curiosity, is your vector type transparent or opaque? Exposed-field structures often perform much better than opaque ones. Writing `myVec.x = expr1; myVec.y = expr2; myVec.z = expr3;` is apt to take the same amount of time as just *setting up the parameters* for a constructor call. The call itself, and any time spent in the constructor, would represent pure wasted overhead. Likewise, while there are times when the JITter might optimize a property read of `myPoint.x` which simply reads a backing field `_x`, there are many cases where the JITter won't. – supercat Jan 13 '13 at 00:14
  • From what I understand, some 32-bit versions of .net (at lest 2.0 and 3.x) have been much better at optimizing class property access than struct property access, so switching from opaque structs might not have been much faster than classes, even though the performance of transparent structs would blow away that of classes or opaque structs. – supercat Jan 13 '13 at 00:16
  • My vectors are read only, setup through the constructor. I did a lot of testing of various implementations and couldn't find anything faster than this, but I can't remember if I tested with transparent structs. In .NET 4 I think it was, they did some optimizations for structs and inlining which helped with some of the issues you mention. Also, since writing my response, I have changed the rays to structs for a ~10% performance boost. – Morten Christiansen Feb 04 '13 at 22:21
  • I'd be curious to know what sort of performance you get if you change from opaque structs to transparent structs. The benchmarking I've done suggests that sometimes the differences are small, but sometimes they can be quite large. – supercat Feb 04 '13 at 23:05
-1

If the structs are small, and not too many exist at once, it SHOULD be placing them on the stack (as long as its a local variable and not a member of a class) and not on the heap, this means the GC doesn't need to be invoked and memory allocation/deallocation should be almost instantaneous.

When passing a struct as a parameter to function, the struct is copied, which not only means more allocations/deallocations (from the stack, which is almost instantaneous, but still has overhead), but the overhead in just transferring data between the 2 copies. If you pass via reference, this is a non issue as you are just telling it where to read the data from, rather than copying it.

I'm not 100% sure on this, but i suspect that returning arrays via an 'out' parameter may also give you a speed boost, as memory on the stack is reserved for it and doesn't need to be copied as the stack is "unwound" at the end of function calls.

Grant Peters
  • 7,691
  • 3
  • 45
  • 57
  • I often hear that structs live on the stack, not on the garbage collected heap; that is not quite correct. They don't live alone but they can certainly live on the heap as part of another reference-type object. – ILoveFortran Feb 28 '09 at 16:36
  • When passing a struct via copy no further allocation is needed - the corresponding amount is reserved on the stack before the call. The copying still needs to happen, though, that's what makes passing structs more expensive, not the allocation. – ILoveFortran Feb 28 '09 at 16:42
  • I did say that it SHOULD be placed on there under the assumption that they are local variable (as the was detailed in the question). Also, memory does need to actually be allocated from the stack as well, its basically a non issue, but there is still slight overhead. – Grant Peters Mar 01 '09 at 06:18
-5

You can also make structs into Nullable objects. Custom classes will not be able to created

as

Nullable<MyCustomClass> xxx = new Nullable<MyCustomClass>

where with a struct is nullable

Nullable<MyCustomStruct> xxx = new Nullable<MyCustomStruct>

But you will be (obviously) losing all your inheritance features

BozoJoe
  • 6,117
  • 4
  • 44
  • 66
  • 4
    This just wraps the struct in a class, thus invoking the GC. You're better off just changing your struct to a class, as it would have the same effect and is probably slightly less confusing – Grant Peters Feb 28 '09 at 16:18