91

Does anyone know the answer and/or have an opinion about this?

Since tuples would normally not be very large, I would assume it would make more sense to use structs than classes for these. What say you?

Jack
  • 871
  • 1
  • 9
  • 17
Bent Rasmussen
  • 5,538
  • 9
  • 44
  • 63
  • 2
    For anyone stumbling here after 2016. In c# 7 and newer, Tuple literals are of the type family `ValueTuple<...>`. See reference at [C# tuple types](https://learn.microsoft.com/en-us/dotnet/csharp/tuples) – Tamir Daniely Mar 21 '19 at 08:05

5 Answers5

97

Microsoft made all tuple types reference types in the interests of simplicity.

I personally think this was a mistake. Tuples with more than 4 fields are very unusual and should be replaced with a more typeful alternative anyway (such as a record type in F#) so only small tuples are of practical interest. My own benchmarks showed that unboxed tuples up to 512 bytes could still be faster than boxed tuples.

Although memory efficiency is one concern, I believe the dominant issue is the overhead of the .NET garbage collector. Allocation and collection are very expensive on .NET because its garbage collector has not been very heavily optimized (e.g. compared to the JVM). Moreover, the default .NET GC (workstation) has not yet been parallelized. Consequently, parallel programs that use tuples grind to a halt as all cores contend for the shared garbage collector, destroying scalability. This is not only the dominant concern but, AFAIK, was completely neglected by Microsoft when they examined this problem.

Another concern is virtual dispatch. Reference types support subtypes and, therefore, their members are typically invoked via virtual dispatch. In contrast, value types cannot support subtypes so member invocation is entirely unambiguous and can always be performed as a direct function call. Virtual dispatch is hugely expensive on modern hardware because the CPU cannot predict where the program counter will end up. The JVM goes to great lengths to optimize virtual dispatch but .NET does not. However, .NET does provide an escape from virtual dispatch in the form of value types. So representing tuples as value types could, again, have dramatically improved performance here. For example, calling GetHashCode on a 2-tuple a million times takes 0.17s but calling it on an equivalent struct takes only 0.008s, i.e. the value type is 20× faster than the reference type.

A real situation where these performance problems with tuples commonly arises is in the use of tuples as keys in dictionaries. I actually stumbled upon this thread by following a link from the Stack Overflow question F# runs my algorithm slower than Python! where the author's F# program turned out to be slower than his Python precisely because he was using boxed tuples. Manually unboxing using a hand-written struct type makes his F# program several times faster, and faster than Python. These issues would never had arisen if tuples were represented by value types and not reference types to begin with...

Community
  • 1
  • 1
J D
  • 48,105
  • 13
  • 171
  • 274
  • Excellent answer. I think the end outcome of this is that people will make their own tuple types for some scenarios. – Bent Rasmussen May 05 '11 at 17:59
  • 2
    @Bent: Yes, that's exactly what I do when I come across tuples on a hot path in F#. Would be nice if they had provided both boxed and unboxed tuples in the .NET Framework though... – J D May 06 '11 at 08:13
  • 18
    Regarding virtual dispatch, I think that your blame is misplaced: the `Tuple<_,...,_>` types could have been sealed, in which case no virtual dispatch would be needed despite being reference types. I'm more curious about why they aren't sealed than why they are reference types. – kvb Apr 13 '12 at 18:58
  • 2
    From my testing, for the scenario in which a tuple would be generated in one function and returned to another function, and then never used again, exposed-field structures seem to offer superior performance for *any* size data item that's not so huge as to blow the stack. Immutable classes are only better if the references will be passed around enough to justify their construction cost (the larger the data item, the less they have to be passed round for the tradeoff to favor them). Since a tuple is supposed to represent simply a bunch of variables stuck together, a struct would seem ideal. – supercat Feb 09 '13 at 22:51
  • I do not understand what the tuples have to do with boxing. There shouldn't be any boxing just because the tuple itself is a reference type, is it? – Stefan Steinegger May 06 '14 at 19:47
  • 1
    @StefanSteinegger: By "boxed tuple" I mean representing a tuple using a reference type instead of a value type. – J D May 06 '14 at 19:56
  • @supercat: You could combine both: public fields with immutability if you make them readonly. So there is really no need for having the items as properties. – mmmmmmmm Sep 02 '14 at 07:05
  • Are these performance statements (esp. wrt. the .NET GC) still true in 2014 ? – Martin Ba Sep 05 '14 at 11:51
  • @MartinBa: Yes, AFAIK. – J D Sep 06 '14 at 17:32
  • "For example, calling GetHashCode on a 2-tuple a million times takes 0.17s but calling it on an equivalent struct takes only 0.008s". Did you properly override `GetHashCode` for the struct before doing this test? Really, if both are a combination of the hashcode of the two items performance shouldn't change. And default implementation for structs is not. – ceztko Oct 05 '14 at 22:36
  • 2
    *"unboxed tuples up to 512 bytes could still be faster than boxed"* - which scenario is that? You *might* be able to allocate a 512B struct faster than a class instance holding 512B of data, but passing it around would be more than 100 times slower (presuming x86). Is there something I am overlooking? – vgru Oct 15 '14 at 08:39
  • 2
    @Groo: "Is there something I am overlooking?". Sounds like you're overlooking most of what goes on under the hood. Are you comparing copying 512B of data with passing a pointer? Have you considered that structs can be held in registers and don't always need to be allocated? Stack-allocated structs? Tail calls can pass structs for free? The GC's write barrier? Overhead of allocating in gen0, marking gen0, copying from gen0 to gen1, marking in gen1, copying from gen1 to gen2 and retraversing gen2 repeatedly? Reified generics using a uniform data representation for reference types only? – J D Oct 29 '14 at 10:54
  • @JonHarrop: "overlooking most of what goes on under the hood"? Possibly, that's why I asked: it seemed like you were ignoring the cost of copying half kB of data, and your comment still seems to be ignoring it. *"structs can be held in registers"* - A 0.5kB register? That would be neat. *"Stack-allocated structs"* - sure, that's a no brainer, that's exactly what I wrote in my initial comment: **passing it around** would be more than 100 times slower. Is that incorrect? GC might need to copy a long lived object around, but what are you comparing this with? A struct that is only copied once? – vgru Oct 29 '14 at 11:36
  • @Groo: "passing it around would be more than 100 times slower. Is that incorrect?". Slower than what (single int, null reference, single reference to existing object, single reference to new short-lived object, single reference to new long-lived object)? Passing it around how (inlined function, non-tail call, static tail call, dynamic tail call, capture into a closure)? You need to compare the relevant number to the performance of a reference type which can be up to 45x slower than handling a single int or null literal reference depending what you're doing. – J D Oct 29 '14 at 14:26
  • @Jon: You are only talking about ways of eliminating passing the value around. *"inlined function"*? Yes, and enclosing it in `#IF DEBUG` also provides a neat performance boost in release builds. *"up to 45x slower than handling a single int"*. Hang on, what "single int"? Where did the other 508 bytes go? Again: **You might be able to allocate a 512B struct faster than a class instance holding 512B of data, but passing it around would be more than 100 times slower**. Parameters in .NET are always passed by value. For value types, that value is data. For reference types, it's the reference. – vgru Oct 29 '14 at 14:42
  • 1
    @Groo: "passing it around would be more than 100 times slower". Depends what you mean by "passing it around" and what you are comparing it to. "Parameters in .NET are always passed by value". Those are the semantics of a high-level language before compilation, yes. That says little about performance. The compilers do everything they can to optimise that away whenever possible. – J D Oct 29 '14 at 20:44
  • 6
    Interesting: https://github.com/dotnet/corefx/blob/master/src/Common/src/System/ValueTuple.cs – Den May 14 '15 at 21:40
  • Haha, got here because of rolling my own tuple as a struct, and wondered why it was a reference type. Was shocked to find out it was a reference type, since it came out I thought it was a value type! WTF!? – Michal Ciechan Aug 08 '17 at 16:33
  • 3
    I think your opinion has been pretty well validated by the fact that C# 7's new Tuple syntax actually creates `ValueTuple<>`s, which are `struct`s. – StriplingWarrior Nov 27 '17 at 22:59
  • @StriplingWarrior: Yay! – J D Nov 28 '17 at 06:55
45

The reason is most likely because only the smaller tuples would make sense as value types since they would have a small memory footprint. The larger tuples (i.e. the ones with more properties) would actually suffer in performance since they would be larger than 16 bytes.

Rather than have some tuples be value types and others be reference types and force developers to know which are which I would imagine the folks at Microsoft thought making them all reference types was simpler.

Ah, suspicions confirmed! Please see Building Tuple:

The first major decision was whether to treat tuples either as a reference or value type. Since they are immutable any time you want to change the values of a tuple, you have to create a new one. If they are reference types, this means there can be lots of garbage generated if you are changing elements in a tuple in a tight loop. F# tuples were reference types, but there was a feeling from the team that they could realize a performance improvement if two, and perhaps three, element tuples were value types instead. Some teams that had created internal tuples had used value instead of reference types, because their scenarios were very sensitive to creating lots of managed objects. They found that using a value type gave them better performance. In our first draft of the tuple specification, we kept the two-, three-, and four-element tuples as value types, with the rest being reference types. However, during a design meeting that included representatives from other languages it was decided that this "split" design would be confusing, due to the slightly different semantics between the two types. Consistency in behavior and design was determined to be of higher priority than potential performance increases. Based on this input, we changed the design so that all tuples are reference types, although we asked the F# team to do some performance investigation to see if it experienced a speedup when using a value type for some sizes of tuples. It had a good way to test this, since its compiler, written in F#, was a good example of a large program that used tuples in a variety of scenarios. In the end, the F# team found that it did not get a performance improvement when some tuples were value types instead of reference types. This made us feel better about our decision to use reference types for tuple.

Andrew Hare
  • 344,730
  • 71
  • 640
  • 635
  • 3
    Great discussion here: http://blogs.msdn.com/bclteam/archive/2009/07/07/building-tuple-matt-ellis.aspx – Keith Adler Mar 09 '10 at 16:50
  • Ahh, I see. I'm still a little confused that value types do not mean anything in practice here :P – Bent Rasmussen Mar 09 '10 at 17:01
  • I just read the comment about no generic interfaces and when looking at the code earlier that was exactly another thing that struck me. It's really quite uninspiring how ungeneric the Tuple types are. But, I guess you can always make your own... There's no syntactic support in C# anyway. Yet at least... Still, the use of generics and the constraints it has still feels limited limited in .Net. There's a substantial potential for very generic very abstract libraries but generics probably need extra things like covariant return types. – Bent Rasmussen Mar 09 '10 at 20:11
  • 7
    Your "16 byte" limit is bogus. When I tested this on .NET 4 I found that the GC is so slow that unboxed tuples up to 512 bytes can still be faster. I'd also question Microsoft's benchmark results. I bet they ignored parallelism (the F# compiler is not parallel) and that is where avoiding GC really pays off because .NET's workstation GC is also not parallel. – J D May 02 '11 at 10:54
  • Out of curiosity, I wonder if the compiler team tested the idea of making tuples be *EXPOSED-FIELD* structs? If one has an instance of a type with various traits, and needs an instance which is identical except for one trait which is different, an exposed-field struct can accomplish that *much* faster than any other type, and the advantage only grows as structs get bigger. – supercat Dec 23 '12 at 21:25
8

If the .NET System.Tuple<...> types were defined as structs, they would not be scalable. For instance, a ternary tuple of long integers currently scales as follows:

type Tuple3 = System.Tuple<int64, int64, int64>
type Tuple33 = System.Tuple<Tuple3, Tuple3, Tuple3>
sizeof<Tuple3> // Gets 4
sizeof<Tuple33> // Gets 4

If the ternary tuple were defined as a struct, the result would be as follows (based on a test example I implemented):

sizeof<Tuple3> // Would get 32
sizeof<Tuple33> // Would get 104

As tuples have built-in syntax support in F#, and they are used extremely often in this language, "struct" tuples would pose F# programmers at risk of writing inefficient programs without even being aware of it. It would happen so easily:

let t3 = 1L, 2L, 3L
let t33 = t3, t3, t3

In my opinion, "struct" tuples would cause a high probability of creating significant inefficiencies in everyday programming. On the other hand, the currently existing "class" tuples also cause certain inefficiencies, as mentioned by @Jon. However, I think that the product of "occurrence probability" times "potential damage" would be much higher with structs than it currently is with classes. Therefore, the current implementation is the lesser evil.

Ideally, there would be both "class" tuples and "struct" tuples, both with syntactic support in F#!

Edit (2017-10-07)

Struct tuples are now fully supported as follows:

Marc Sigrist
  • 3,964
  • 3
  • 22
  • 23
  • 2
    If one avoids unnecessary copying, a exposed-field struct of *any* size will be more efficient than an immutable class of the same size, unless each instance gets copied enough times that the cost of such copying overcome the cost of creating a heap object (the break-even number of copies varies with object size). Such copying may be unavoidable if one wants a struct which pretends to be immutable, but structs which is designed to appear as collections of variables (which is what struct *are*) can be used efficiently even when they are huge. – supercat Oct 25 '12 at 16:14
  • 2
    It may be that F# does not play nicely with the idea of passing structs by `ref`, or may not like the fact that so-called "immutable structs" aren't, especially when boxed. It's too bad .net never implemented the concept of passing parameters by an enforceable `const ref`, since in many cases such semantics are what is really required. – supercat Oct 25 '12 at 16:29
  • @supercat ..."the cost of creating a heap object": The cost of _creating_ a heap object is negligible, it's the cost of _collecting a huge number of_ heap objects which may be significant in certain pathological edge scenarios. -- ..."structs [...] can be used efficiently even when they are huge.": Of course they can. My point was that, if the common-purpose type `System.Tuple` were a struct, the risk of inadvertently creating, and inefficiently using, huge nested instances of it (in _everyday work_) would be too high compared to the benefits (which you would only get in _edge scenarios_). – Marc Sigrist Oct 26 '12 at 07:49
  • @supercat ..."F# does not play nicely with the idea of passing structs by ref": F# plays very nicely with this idea, such as: `let passGenericStructByRef(v:'T byref when 'T:struct) = ...`. C# and VB.Net also play nicely. The only thing you can't do in these languages is _return_ structs by reference.. -- I agree that C++ `const ref` semantics would be great in .Net, but it is not needed to pass structs efficiently; it would only help in enforcing immutability. – Marc Sigrist Oct 26 '12 at 08:16
  • I didn't know whether or not F# liked passing structs by ref; if it does, that's great. The reason `const ref` is needed to pass structs efficiently is that only writable struct storage locations can be passed by `ref`. Calling a struct method or property on a read-only struct will cause the system to make a copy of the struct then pass that copy by `ref` to the method or property--even if the method or property would not write to it. If .net had a `const ref` concept, those extra copies could be avoided. What I'd like to see would be for it to have 'guaranteed' and 'non-guaranteed' `const`... – supercat Oct 26 '12 at 14:50
  • ...such that compilers would be expected to prevent functions which declared a param as a "non-guaranteed const-ref" from modifying it, but the CLR would not rely upon that and callers would still be expected to make defensive copies of read-only structures (as things are now); callers of functions declared as "guaranteed const-ref", however, would not be expected to make defensive copies, but functions could only report themselves as GCR if the CLR could verify that they neither modified their argument nor passed it as non-guaranteed CR without making a defensive copy. – supercat Oct 26 '12 at 14:58
  • Additionally, large value-type function returns could be handled by having the called function return an address (which could either point to a stack area allocated for the return, or to an existing instance). Callers would be expected not to modify the returned instance, but either copy it directly to some other storage location (e.g. `myStruct = thing.Property`) or pass it as guaranteed-const-ref (e.g. `someVar = thing.Bounds.Right`, if the `Right` property took `this` as guaranteed-const-ref.) – supercat Oct 26 '12 at 15:01
  • 1
    Incidentally, I regard the amortized cost of GC as being part of the cost of allocating objects; if an L0 GC would be necessary after every megabyte of allocations, then the cost of allocating 64 bytes is about 1/16,000 of the cost of an L0 GC, plus a fraction of the cost of any L1 or L2 GC's that become necessary as a consequence of it. – supercat Oct 26 '12 at 20:48
  • 4
    "I think that the product of occurrence probability times potential damage would be much higher with structs than it currently is with classes". FWIW, I have very rarely seen tuples of tuples in the wild and consider them a design flaw but I very often see people struggle with awful performance when using (ref) tuples as keys in a `Dictionary`, e.g. here: http://stackoverflow.com/questions/5850243/fsharp-runs-my-algorithm-slower-than-python – J D Oct 30 '14 at 00:43
  • 3
    @Jon It's two years since I wrote this answer, and I now agree with you that it would be preferrable if at least 2- and 3-tuples were structs. An F# language user voice [suggestion](https://fslang.uservoice.com/forums/245727-f-language/suggestions/6148669-short-tuples-compiled-as-structs-up-to-25x-per) has been made in this regard. The issue has some urgency, as there has been a massive growth of applications in big data, quantitative finance, and gaming in the recent years. – Marc Sigrist Oct 30 '14 at 10:57
  • @MarcSigrist: Yeah and, FWIW, I discourage the use of 4-tuples and beyond because you really need the clarity of named fields when you've got more than 3 of them. – J D Oct 30 '14 at 18:52
4

For 2-tuples, you can still always use the KeyValuePair<TKey,TValue> from earlier versions of the Common Type System. It's a value type.

A minor clarification to the Matt Ellis article would be that the difference in use semantics between reference and value types is only "slight" when immutability is in effect (which, of course, would be the case here). Nevertheless, I think it would have been best in the BCL design not to introduce the confusion of having Tuple cross over to a reference type at some threshold.

Glenn Slayden
  • 17,543
  • 3
  • 114
  • 108
  • If a value will be used once after it's returned, an exposed-field struct of any size will outperform any other type, provided only that it's not so monstrously huge as to blow the stack. The cost of building a class object will only be recouped if the reference ends up being shared multiple times. There a times when it's useful for a general-purpose fixed-size heterogeneous type to be a class, but there are other times when a struct would be better--even for "big" things. – supercat Feb 11 '13 at 03:58
  • Thanks for adding this useful rule-of-thumb. I am hoping however that you didn't misunderstand my position: I'm a value-type junkie. (http://stackoverflow.com/a/14277068/ should leave no doubt). – Glenn Slayden Feb 20 '13 at 22:00
  • Value types are one of the great features of .net, but unfortunately the person who wrote up the msdn dox failed to recognize that there are multiple disjoint usage cases for them, and that different usage cases should have different guidelines. The style of struct msdn recommends should only be used with structs that represent a homogenous value, *but* if one needs to represent some independent values fastened together with duct tape, one shouldn't use *that* style of struct--one should use a struct with exposed public fields. – supercat Feb 20 '13 at 22:23
0

I don't know but if you have ever used F# Tuples are part of the language. If I made a .dll and returned a type of Tuples it be nice to have a type to put that in. I suspect now that F# is part of the language (.Net 4) some modifications to CLR were made to accommodate some common structures in F#

From http://en.wikibooks.org/wiki/F_Sharp_Programming/Tuples_and_Records

let scalarMultiply (s : float) (a, b, c) = (a * s, b * s, c * s);;

val scalarMultiply : float -> float * float * float -> float * float * float

scalarMultiply 5.0 (6.0, 10.0, 20.0);;
val it : float * float * float = (30.0, 50.0, 100.0)
kleopatra
  • 51,061
  • 28
  • 99
  • 211