41

I've researched a bit and it seems that the common wisdom says that structs should be under 16 bytes because otherwise they incur a performance penalty for copying. With C#7 and ref return it became quite easy to completely avoid copying structs altogether. I assume that as the struct size gets smaller, passing by ref has more overhead that just copying the value.

Is there a rule of thumb about when passing structs by value becomes faster than by ref? What factors affect this? (Struct size, process bitness, etc.)

More context

I'm working on a game with the vast majority of data represented as contiguous arrays of structs for maximum cache-friendliness. As you might imagine, passing structs around is quite common in such a scenario. I'm aware that profiling is the only real way of determining the performance implications of something. However, I'd like to understand the theoretical concepts behind it and hopefully write code with that understanding in mind and profile only the edge cases.

Also, please note that I'm not asking about best practices or the sanity of passing everything by ref. I'm aware of "best practices" and implications and I deliberately choose not to follow them.

Addressing the "duplicate" tag

Performance of pass by value vs. pass by reference in C# .NET - This question discusses passing a reference type by ref which is completely different to what I'm asking.

In .Net, when if ever should I pass structs by reference for performance reasons? - The second question touches the subject a bit, but it's about a specific size of the struct.

To answer the questions from Eric Lippert's article:

Do you really need to answer that question? Yes I do. Because it'll affect how I write a lot of code.

Is that really the bottleneck? Probably not. But I'd still like to know since that's the data access pattern for 99% of the program. In my mind this is similar to choosing the correct data structure.

Is the difference relevant? It is. Passing large structs by ref is faster. I'm just trying to understand the limits of this.

What is this “faster” you speak of? As in giving less work to the CPU for the same task.

Are you looking at the big picture? Yes. As previously stated, it affects how I write the whole thing.

I know I could measure a lot of different combinations. And what does that tell me? That X is faster thatn Y on my combination of [.NET Version, process bitness, OS, CPU]. What about Linux? What about Android? What about iOS? Should I benchmark all permutations on all possible hardware/software combinations?

I don't think that's a viable strategy. Therefore I ask here where hopefully someone who knows a lot about CLR/JIT/ASM/CPU can tell me how that works so I can make informed decisions when writing code.

The answer I'm looking for is similar to the aforementioned 16 byte guideline for struct sizes with the explanation why.

halfer
  • 19,824
  • 17
  • 99
  • 186
loodakrawa
  • 1,468
  • 14
  • 28
  • [Which is faster?](https://ericlippert.com/2012/12/17/performance-rant/) – Peter Duniho Sep 19 '17 at 04:59
  • @PeterDuniho - the other question is asking about passing a reference type by ref and my question is strictly about the size of structs. Also, as 90% of my data access patterns are affected by this I can't profile all the permutations of various data dristibutions. – loodakrawa Sep 19 '17 at 05:03
  • There are two marked duplicates. The first includes some discussion of the value type scenario, while the second is entirely about that. As far as _"I can't profile all the permutations of data dristibution"_ goes, you can profile the _important_ permutations. Scenarios that don't come up often aren't worth optimizing for. – Peter Duniho Sep 19 '17 at 05:05
  • _"passing structs around is quite common in such a scenario"_ -- why? putting things in an array helps only if you access the thing from the array. As soon as you copy it out into a variable, you're no longer taking advantage of the data the array populated the cache with. In any case, your question is purely speculative and too broad. There are too many things that can affect performance for anyone to be able to definitively tell you how large your structs can be without needing to pass by ref, especially since the design choice will affect things other than method calls. – Peter Duniho Sep 19 '17 at 05:08
  • 3
    The answers in the second question boil down to when to use a struct and when not to. I have a real-world scenario where almost ALL of my data is represented as structs of widely varying sizes. – loodakrawa Sep 19 '17 at 05:11
  • 2
    You're obviously completely missing the point. I'm NOT copying anything because I'm passing and returning by ref. – loodakrawa Sep 19 '17 at 05:14
  • _"The answers in the second question boil down to when to use a struct and when not to"_ -- and so does your question. If you're always passing by ref, why are you asking the question? If you're trying to compare passing by ref with not passing by ref, then the scenario where you're not passing by ref involves copying the data. Either way, your question remains too broad. – Peter Duniho Sep 19 '17 at 05:27
  • 3
    What's broad about the part marked as bold in my question? I'm certain it can be unequivocally answered in 2 sentences by someone who understands the inner workings of CLR and the JITter. The rest of the question was meant to describe how my scenario is different than the ones in questions you linked for example. – loodakrawa Sep 19 '17 at 05:37
  • You think two sentences would cover "what factors affect this?" Sorry, if you really believe that, you have seriously underestimated the complexities of performance tuning. Again, I refer you to [Which is faster?](https://ericlippert.com/2012/12/17/performance-rant/). No answer within the intended scope of Stack Overflow is going to adequately cover your question. – Peter Duniho Sep 19 '17 at 05:55
  • 3
    That is just your opinion. Since I'd still like to get my answer - what are my options? How can I get someone who understands more about the topic to see this question now that's closed? Edit it? Flag it? – loodakrawa Sep 19 '17 at 06:09
  • This is my thought process approaching the question: There are two factors affecting performance. The first is where to allocate memory. The second is how you pass data to a method. If only considering the parameter passing performance, passing by value will never be faster than passing by reference, unless the struct you are passing is smaller than size of reference. You lose performance passing by value, But you gain performance by allocating memory on stack. How much you can lose is determined by how much you can gain. How much you can gain is a much bigger topic though. – Xiaoguo Ge Sep 19 '17 at 15:43
  • 2
    @PeterDuniho: Passing an array element as a `ref` parameter will allow the recipient to act upon it in place, even if the element is a structure type. That's one of the big advantages of using arrays of structure types. – supercat Sep 19 '17 at 21:07
  • @supercat: _"Passing an array element as a ref parameter will allow the recipient to act upon it in place"_ -- I'm well aware of that. So what? Since the question is asking to compare passing by value with passing by reference, one must assume that there is no need to modify the value. Otherwise, it wouldn't even be a question, because passing by reference would be the only option. – Peter Duniho Sep 19 '17 at 21:34
  • @PeterDuniho: I was referring to your statement about how having structures in an array is only useful when accessing things from the array. It wasn't clear whether you were counting accesses through a byref as accesses from the array. – supercat Sep 19 '17 at 21:56
  • @PeterDuniho: "one must assume that there is no need to modify the value" -- what are you talking about? You can copy it from the array, modify it and copy it back. – loodakrawa Sep 19 '17 at 22:56
  • @loodakrawa: _"You can copy it from the array, modify it and copy it back"_ -- you could. But that would be pretty silly to do that if passing by reference was already under consideration anyway. Especially if one hasn't bothered to do any actual performance testing to see if there's some benefit to completely ignoring the semantics of the operation as a basis of design. – Peter Duniho Sep 19 '17 at 23:14
  • 1
    That is the whole point. When the structs are small enough, copying is faster. I'm trying to understand what determines that limit - probably the way the CLR/JIT handles refs. Performance testing doesn't really tell me WHY which essentially is my question. See edits. – loodakrawa Sep 19 '17 at 23:21
  • "why" questions are hard to answer. If your question is "what machine code is generated by the jitter for a copy by ref vs a copy by value?" then use the debugger to look at the machine code that is generated for your particular scenario. – Eric Lippert Sep 21 '17 at 19:03
  • 4
    This is actually a relevant question. I see it as totally idiotic for 99% of all programs - but if you do an inner game loop, or i.e. a ticker plant for a trading backend, this is the type of issue that really come up and MAKES A DIFFERENCE. It gets even more relevant if you take span into account so you can move around views / parts of an array without copying. I personally have some programs where the core loop is about 3 pages of code, using 95% of the processing time and - runs loops updating values in an array that are represented as structs for performance reasons. Good question. – TomTom Mar 07 '20 at 18:06
  • 1
    It's not the answer you are looking for but you may be interested in this question: https://stackoverflow.com/questions/2437925/net-why-is-struct-better-with-being-less-than-16-bytes/2437938#2437938 plus my own basic investigation here: https://forum.unity.com/threads/opinions-about-tokenizing.362531/#post-2360539 The take away for me was that this question is better left unanswered. Not because nobody knows but because it is an implementation detail subject to getting changed over time and environment. – eisenpony Mar 09 '20 at 00:16

3 Answers3

7

generally, passing by reference should be faster.
when you pass a struct by reference, you are only passing a pointer to the struct, which is only a 32/64 bit integer.
when you pass a struct by value, you need to copy the entire struct and then pass a pointer to the new copy.
unless the struct is very small, for example, an int, passing by reference is faster.

also, passing by value would increase the number of calls to the os for memory allocation and de-allocation, these calls are time-consuming as the os has to check a registry for available space.

Fred
  • 3,365
  • 4
  • 36
  • 57
GideonMax
  • 526
  • 4
  • 11
  • 1
    To clarify, in this case "very small" = "smaller than a pointer on the device" – LLSv2.0 Mar 11 '20 at 13:57
  • 1
    @LLSv2.0 yes, though in certain circumstances you would still want to pass by value to make your code easier to work with, though that REALLY depends on the situation – GideonMax Mar 11 '20 at 14:01
  • 4
    Needs more empirical data. Passing “a pointer” requires an additional indirect so (and depending on implementation) there is some cut-over point beyond this simple generality. What is that inflection point, and where? How/why does it differ across environment? – user2864740 Mar 11 '20 at 16:33
  • One other consideration: does passing a struct by ref cause the struct to be boxed/unboxed? If so, that would increase the cost of passing by ref. Then a larger size threshold might be appropriate for passing by ref instead of by value. – Riggy Mar 12 '20 at 13:55
  • 2
    @Riggy According to [Microsoft](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/ref), "There is no boxing of a value type when it is passed by reference." – Riggy Mar 12 '20 at 13:56
  • @user2864740 in standard implementations you pass a pointer to the struct whether it's ref or value, you don't pass a pointer to the pointer as that would be needless indirection and any good compiler would not do that – GideonMax Mar 16 '20 at 16:30
  • "passing by value would increase the number of calls to the os for memory allocation and de-allocation" - that sounds very wrong. Do you have a source for that? I would expect it to use the stack memory with no need for any extra allocations. – loodakrawa Mar 16 '20 at 17:09
  • @loodakrawa in many cases, for example, structs that are array elements, structs will be heap allocated ,in this case, the structs are in arrays. though I agree that that doesn't always apply. also, stack/heap allocation is implementation dependent, so it's better to not rely on that. – GideonMax Mar 16 '20 at 17:17
  • Well, I agree, but that has nothing to do with passing structs around. Once you allocate an array of structs there's no more heap allocations no matter how you access the data in the array - either via ref or copying. And you answer states that passing by value would allocate additional memory – loodakrawa Mar 16 '20 at 17:24
  • 2
    Also, as @user2864740 said - I'm asking about the inflection point. 1 int is faster, I agree. What about 2 ints? 3? 4? – loodakrawa Mar 16 '20 at 17:33
  • @loodakrawa the inflection point is implementation and hardware dependent, it depends on the efficiency of the garbage collector, speed of different operations etc... in general I would say it should be below 12 ints but for the inflection point you need to do tests. – GideonMax Mar 16 '20 at 18:31
3

If you pass around structs by reference then they can be of any size. You are still dealing with a 8 (x64 assumed) byte pointer. For highest performance you need a CPU cache friendly design which is is called Data Driven Design.

Games often use a special Data Driven Design called Entity Component System. See the book Pro .NET Memory Management by Konrad Kokosa Chapter 14.

The basic idea is that you can update your game entities which are e.g. Movable, Car, Plane, ... share common properties like a position which is for all entities stored in a contigous array. If you need to increment the position of 1K entities you just need to lookup the array index of the position array of all entities and update them there. This provides the best possible data locality. If all would be stored in classes the CPU prefetcher would be lost by the many this pointers for each class instance.

See this Intel post about some reference architecture: https://software.intel.com/en-us/articles/get-started-with-the-unity-entity-component-system-ecs-c-sharp-job-system-and-burst-compiler

There are plenty of Entity Component Systems out there but so far I have seen none using ref structs as their main working data structure. The reason is that all popular ones are existing much longer than C# 7.2 where ref structs were introduced.

Alois Kraus
  • 13,229
  • 1
  • 38
  • 64
  • 2
    Implementing an ECS as the core architecture is precisely the reason why I asked the question in the first place. Anyway, if the data is not stored as an array of structs but as an array of classes, then it's likely that the data will be spread across the heap and not contiguous in any way because an array of classes is effectively an array of pointers to the heap – loodakrawa Mar 16 '20 at 17:18
  • @loodakrawa: Did you complete your ECS system with C# 7.2 features? Is it open source? Would be interesting how yours compares to other like Entitas. – Alois Kraus Mar 16 '20 at 22:26
  • I implemented it ~3 years ago so I didn't use the 7.2 features. I stopped trying to make games since but I'm getting into it again now so I'll probably upgrade it with the new goodies. Anyway, you can check it out here: https://github.com/loodakrawa/ScatteredLogic – loodakrawa Mar 17 '20 at 12:23
  • Here, open-source ECS framework with ref structs: [LeoECS](https://github.com/Leopotam/ecs). Fast and not only for games. Enjoy that speed =) – picolino Aug 20 '20 at 17:56
3

I finally found the answer. The breaking point is System.IntPtr.Size. In Microsoft's own words from Write safe and efficient C# code:

Add the in modifier to pass an argument by reference and declare your design intent to pass arguments by reference to avoid unnecessary copying. You don't intend to modify the object used as that argument.

This practice often improves performance for readonly value types that are larger than IntPtr.Size. For simple types (sbyte, byte, short, ushort, int, uint, long, ulong, char, float, double, decimal and bool, and enum types), any potential performance gains are minimal. In fact, performance may degrade by using pass-by-reference for types smaller than IntPtr.Size.

loodakrawa
  • 1,468
  • 14
  • 28