99

I'm about to create 100,000 objects in code. They are small ones, only with 2 or 3 properties. I'll put them in a generic list and when they are, I'll loop them and check value a and maybe update value b.

Is it faster/better to create these objects as class or as struct?

EDIT

a. The properties are value types (except the string i think?)

b. They might (we're not sure yet) have a validate method

EDIT 2

I was wondering: are objects on the heap and the stack processed equally by the garbage collector, or does that work different?

Dan Lugg
  • 20,192
  • 19
  • 110
  • 174
Michel
  • 23,085
  • 46
  • 152
  • 242
  • 2
    Are they only going to have public fields, or are they also going to have methods? Are the types primitive types, such as integers? Will they be contained in an array, or in something like List? – JeffFerguson Oct 15 '10 at 13:39
  • When in doubt use a class. If you need automatic initialization in an array, use a struct. – leppie Oct 15 '10 at 13:41
  • 14
    A list of mutable structs? Watch out for the velociraptor. – Anthony Pegram Oct 15 '10 at 13:49
  • @Michel: The GC will never touch the stack. – leppie Oct 15 '10 at 13:52
  • @leppie: how are 'objects/structs' removed from the stack then? – Michel Oct 15 '10 at 13:55
  • 1
    @Anthony: i'm afraid i'm missing the velociraptor joke :-s – Michel Oct 15 '10 at 13:56
  • @Michel: Just like most other native languages deal with the stack. Pushing and popping of the stack pointer. – leppie Oct 15 '10 at 13:58
  • 5
    The velociraptor joke is from XKCD. But when you're throwing around the 'value types are allocated on the stack' misconception/implementation detail (delete as applicable) then it's Eric Lippert you need to watch out for... – Greg Beech Oct 15 '10 at 14:01
  • 1
    "b. they might (we're not sure yet) have a validate method" since structs should be immutable you can validate them in the constructor – CodesInChaos Oct 15 '10 at 17:40
  • 4
    velociraptor : http://imgs.xkcd.com/comics/goto.png – WernerCD Oct 15 '10 at 18:12
  • Also see http://stackoverflow.com/questions/85553/when-should-i-use-a-struct-instead-of-a-class – nawfal May 27 '13 at 05:52

10 Answers10

145

Is it faster to create these objects as class or as struct?

You are the only person who can determine the answer to that question. Try it both ways, measure a meaningful, user-focused, relevant performance metric, and then you'll know whether the change has a meaningful effect on real users in relevant scenarios.

Structs consume less heap memory (because they are smaller and more easily compacted, not because they are "on the stack"). But they take longer to copy than a reference copy. I don't know what your performance metrics are for memory usage or speed; there's a tradeoff here and you're the person who knows what it is.

Is it better to create these objects as class or as struct?

Maybe class, maybe struct. As a rule of thumb: If the object is :
1. Small
2. Logically an immutable value
3. There's a lot of them
Then I'd consider making it a struct. Otherwise I'd stick with a reference type.

If you need to mutate some field of a struct it is usually better to build a constructor that returns an entire new struct with the field set correctly. That's perhaps slightly slower (measure it!) but logically much easier to reason about.

Are objects on the heap and the stack processed equally by the garbage collector?

No, they are not the same because objects on the stack are the roots of the collection. The garbage collector does not need to ever ask "is this thing on the stack alive?" because the answer to that question is always "Yes, it's on the stack". (Now, you can't rely on that to keep an object alive because the stack is an implementation detail. The jitter is allowed to introduce optimizations that, say, enregister what would normally be a stack value, and then it's never on the stack so the GC doesn't know that it is still alive. An enregistered object can have its descendents collected aggressively, as soon as the register holding onto it is not going to be read again.)

But the garbage collector does have to treat objects on the stack as alive, the same way that it treats any object known to be alive as alive. The object on the stack can refer to heap-allocated objects that need to be kept alive, so the GC has to treat stack objects like living heap-allocated objects for the purposes of determining the live set. But obviously they are not treated as "live objects" for the purposes of compacting the heap, because they're not on the heap in the first place.

Is that clear?

Robert Siemer
  • 32,405
  • 11
  • 84
  • 94
Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • Eric, do you know if either the compiler or the jitter makes use of immutability (perhaps if enforced with `readonly`) to allow optimisations. I wouldn't let that affect a choice on mutability (I'm a nut for efficiency details in theory, but in practice my first move towards efficiency is always trying to have as simple a guarantee of correctness as I can and hence not have to waste CPU cycles and brain cycles on checks and edge-cases, and being appropriately mutable or immutable helps there), but it would counter any knee-jerk reaction to your saying immutability can be slower. – Jon Hanna Oct 15 '10 at 14:45
  • @Jon: The C# compiler optimizes *const* data but not *readonly* data. I do not know if the jit compiler performs any caching optimizations on readonly fields. – Eric Lippert Oct 15 '10 at 14:50
  • A pity, as I know knowledge of immutability allows for some optimisations, but hit limits of my theoretical knowledge at that point, but they're limits I'd love to stretch. In the meantime "it can be faster both ways, here's why, now test and find out which applies in this case" is useful to be able to say :) – Jon Hanna Oct 15 '10 at 15:03
  • I would recommend to read http://www.simple-talk.com/dotnet/.net-framework/object-overhead-the-hidden-.net-memory--allocation-cost/ and your own article (@Eric): http://blogs.msdn.com/b/ericlippert/archive/2010/09/30/the-truth-about-value-types.aspx to start dive into details. There are many other good articles around. BTW, the difference in processing 100 000 small in-memory objects is hardly noticeable thru there some memory overhead (~2.3 MB) for class. It can be easily checked by simple test. – Nick Martyshchenko Oct 15 '10 at 15:50
23

Sometimes with struct you don't need to call the new() constructor, and directly assign the fields making it much faster that usual.

Example:

Value[] list = new Value[N];
for (int i = 0; i < N; i++)
{
    list[i].id = i;
    list[i].isValid = true;
}

is about 2 to 3 times faster than

Value[] list = new Value[N];
for (int i = 0; i < N; i++)
{
    list[i] = new Value(i, true);
}

where Value is a struct with two fields (id and isValid).

struct Value
{
    int id;
    bool isValid;

    public Value(int i, bool isValid)
    {
        this.i = i;
        this.isValid = isValid;
    }
}

On the other hand is the items needs to be moved or selected value types all that copying is going to slow you down. To get the exact answer I suspect you have to profile your code and test it out.

John Alexiou
  • 28,472
  • 11
  • 77
  • 133
6

Arrays of structs are represented on the heap in a contiguous block of memory, whereas an array of objects is represented as a contiguous block of references with the actual objects themselves elsewhere on the heap, thus requiring memory for both the objects and for their array references.

In this case, as you are placing them in a List<> (and a List<> is backed onto an array) it would be more efficient, memory-wise to use structs.

(Beware though, that large arrays will find their way on the Large Object Heap where, if their lifetime is long, may have an adverse affect on your process's memory management. Remember, also, that memory is not the only consideration.)

Paul Ruane
  • 37,459
  • 12
  • 63
  • 82
  • You are able to use `ref` keyword to deal with this. – leppie Oct 15 '10 at 13:55
  • "Beware though, that large arrays will find their way on the Large Object Heap where, if their lifetime is long, may have an adverse affect on your process's memory management." - I'm not quite sure why you'd think that? Being allocated on the LOH won't cause any adverse effects on memory management unless (possibly) it's a short-lived object and you want to reclaim the memory quickly without waiting for a Gen 2 collection. – Jon Artus Nov 11 '10 at 12:30
  • @Jon Artus: the LOH does not get compacted. Any long-lived object will divide the LOH into the area of free memory before and the area after. Contiguous memory is required for allocation and if these areas are not big enough for an allocation then more memory is allocated to the LOH (i.e. you will get LOH fragmentation). – Paul Ruane Nov 11 '10 at 15:03
5

Structs may seem similar to classes, but there are important differences that you should be aware of. First of all, classes are reference types and structs are value types. By using structs, you can create objects that behave like the built-in types and enjoy their benefits as well.

When you call the New operator on a class, it will be allocated on the heap. However, when you instantiate a struct, it gets created on the stack. This will yield performance gains. Also, you will not be dealing with references to an instance of a struct as you would with classes. You will be working directly with the struct instance. Because of this, when passing a struct to a method, it's passed by value instead of as a reference.

More here:

http://msdn.microsoft.com/en-us/library/aa288471(VS.71).aspx

kyndigs
  • 3,074
  • 1
  • 18
  • 22
  • 4
    I know it says it on MSDN, but MSDN is not telling the whole story. Stack vs. heap is an implementation detail and structs do not *always* go on the stack. For just one recent blog on this, see: http://blogs.msdn.com/b/ericlippert/archive/2010/09/30/the-truth-about-value-types.aspx – Anthony Pegram Oct 15 '10 at 13:47
  • "...it's passed by value..." both references and structs are passed by value (unless one uses 'ref') — it's whether a value or reference is being passed that differs, i.e. structs are passed value-by-value, class objects are passed reference-by-value and ref marked params pass reference-by-reference. – Paul Ruane Oct 15 '10 at 13:56
  • 10
    That article is misleading on several key points, and I've asked the MSDN team to revise or delete it. – Eric Lippert Oct 15 '10 at 14:14
  • @Eric Lippert: Would it be possible for you to encourage the use of more-distinct terminology for object instances (stored on the heap) and object references (stored in fields, variables, or wherever)? Also, with regard to "mutable structs are evil", it seems that mutable structs are mostly good, *except* for the places where temporary structs are created. Being able to change something, secure in the knowledge that nothing else is aliased to it, would seem useful ability. Sure one could clone class objects all over the place, but that would seem rather wasteful. – supercat Oct 15 '10 at 14:53
  • I think that mutable properties on a struct are OK (but not very nice) since the compiler usually catches assignments to properties of temporary copies, but mutating methods are definitely evil. If I need them for performance reasons I'd use a static method with a ref parameter instead of modifying *this* – CodesInChaos Oct 15 '10 at 17:44
  • 2
    @supercat: to address your first point: the larger point is that in managed code *where a value or reference to a value is stored is largely irrelevant*. We have worked hard to make a memory model that most of the time allows developers to allow the runtime to make smart storage decisions on their behalf. These distinctions matter very much when failure to understand them has crashing consequences as it does in C; not so much in C#. – Eric Lippert Oct 15 '10 at 19:34
  • 1
    @supercat: to address your second point, no mutable structs are mostly evil. For example, void M() { S s = new S(); s.Blah(); N(s); }. Refactor to: void DoBlah(S s) { s.Blah(); } void M( S s = new S(); DoBlah(s); N(s); }. That just introduced a bug because S is a mutable struct. Did you *immediately* see the bug? Or did the fact that S is a mutable struct *hide* the bug from you? – Eric Lippert Oct 15 '10 at 19:36
  • @Eric Lippert: I think many people who are used to by-value semantics in other languages get confused by something like "car2=car1; car2.color=blue;" affecting car1. If one thinks of car1 and car2 as holding VINs (vehicle IDs) rather than actual vehicles, the semantics make sense. A VIN doesn't have a color. The car represented by a VIN has a color. Saying "paint car 1G1KXQ58J green" doesn't mean one should paint the numbers green--it means one should find the car with that VIN and paint it. Saying "car2=car1" simply copies the VIN--not the car itself. – supercat Oct 15 '10 at 22:31
  • @Eric Lippert: In the latter case, the bug was immediately obvious; DoBlah needs to accept the structure by reference. There are some subtle bug cases, like methods which mutate a structure (evil), but suppose one needs to hold 1,000,000 items each with ten 16-bit parts, and it will often be necessary to change different combinations of half of those parts. Mutable structures would be pretty efficient. One copy operation on check-out, one on check-in. Non-mutable structures would seem to require making a copy for each edit unless one has many different 'change' functions. – supercat Oct 15 '10 at 22:37
  • @Eric Lippert: Besides, I consider a more common bug scenario to be what happens with mutable classes if e.g. someone forgets to clone an object before storing it in a Dictionary. That doesn't happen with structs. I tend to think that structs should only be mutable if they're Plain Old Data, but see nothing wrong with POD structs. (BTW, returning to your example, I'm assuming Blah() is an evil method which mutates the struct--I'll agree with you 100% in saying that methods which mutate structs are a bad idea). – supercat Oct 15 '10 at 22:40
4

If they have value semantics, then you should probably use a struct. If they have reference semantics, then you should probably use a class. There are exceptions, which mostly lean towards creating a class even when there are value semantics, but start from there.

As for your second edit, the GC only deals with the heap, but there is a lot more heap space than stack space, so putting things on the stack isn't always a win. Besides which, a list of struct-types and a list of class-types will be on the heap either way, so this is irrelevant in this case.

Edit:

I'm beginning to consider the term evil to be harmful. After all, making a class mutable is a bad idea if it's not actively needed, and I would not rule out ever using a mutable struct. It is a poor idea so often as to almost always be a bad idea though, but mostly it just doesn't coincide with value semantics so it just doesn't make sense to use a struct in the given case.

There can be reasonable exceptions with private nested structs, where all uses of that struct are hence restricted to a very limited scope. This doesn't apply here though.

Really, I think "it mutates so it's a bad stuct" is not much better than going on about the heap and the stack (which at least does have some performance impact, even if a frequently misrepresented one). "It mutates, so it quite likely doesn't make sense to consider it as having value semantics, so it's a bad struct" is only slightly different, but importantly so I think.

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
3

The best solution is to measure, measure again, then measure some more. There may be details of what you're doing that may make a simplified, easy answer like "use structs" or "use classes" difficult.

FMM
  • 4,289
  • 1
  • 25
  • 44
  • agree with the measure part, but in my opinion it was a straight forward and clear example, and i thought that maybe some generic things could be said about it. And as it turned out, some people did. – Michel Oct 15 '10 at 21:04
3

A struct is, at its heart, nothing more nor less than an aggregation of fields. In .NET it's possible for a structure to "pretend" to be an object, and for each structure type .NET implicitly defines a heap object type with the same fields and methods which--being a heap object--will behave like an object. A variable which holds a reference to such a heap object ("boxed" structure) will exhibit reference semantics, but one which holds a struct directly is simply an aggregation of variables.

I think much of the struct-versus-class confusion stems from the fact that structures have two very different usage cases, which should have very different design guidelines, but the MS guidelines don't distinguish between them. Sometimes there is a need for something which behaves like an object; in that case, the MS guidelines are pretty reasonable, though the "16 byte limit" should probably be more like 24-32. Sometimes, however, what's needed is an aggregation of variables. A struct used for that purpose should simply consist of a bunch of public fields, and possibly an Equals override, ToString override, and IEquatable(itsType).Equals implementation. Structures which are used as aggregations of fields are not objects, and shouldn't pretend to be. From the structure's point of view, the meaning of field should be nothing more or less than "the last thing written to this field". Any additional meaning should be determined by the client code.

For example, if a variable-aggregating struct has members Minimum and Maximum, the struct itself should make no promise that Minimum <= Maximum. Code which receives such a structure as a parameter should behave as though it were passed separate Minimum and Maximum values. A requirement that Minimum be no greater than Maximum should be regarded like a requirement that a Minimum parameter be no greater than a separately-passed Maximum one.

A useful pattern to consider sometimes is to have an ExposedHolder<T> class defined something like:

class ExposedHolder<T>
{
  public T Value;
  ExposedHolder() { }
  ExposedHolder(T val) { Value = T; }
}

If one has a List<ExposedHolder<someStruct>>, where someStruct is a variable-aggregating struct, one may do things like myList[3].Value.someField += 7;, but giving myList[3].Value to other code will give it the contents of Value rather than giving it a means of altering it. By contrast, if one used a List<someStruct>, it would be necessary to use var temp=myList[3]; temp.someField += 7; myList[3] = temp;. If one used a mutable class type, exposing the contents of myList[3] to outside code would require copying all the fields to some other object. If one used an immutable class type, or an "object-style" struct, it would be necessary to construct a new instance which was like myList[3] except for someField which was different, and then store that new instance into the list.

One additional note: If you are storing a large number of similar things, it may be good to store them in possibly-nested arrays of structures, preferably trying to keep the size of each array between 1K and 64K or so. Arrays of structures are special, in that indexing one will yield a direct reference to a structure within, so one can say "a[12].x = 5;". Although one can define array-like objects, C# does not allow for them to share such syntax with arrays.

supercat
  • 77,689
  • 9
  • 166
  • 211
1

Use classes.

On a general note. Why not update value b as you create them?

Preet Sangha
  • 64,563
  • 18
  • 145
  • 216
1

From a c++ perspective I agree that it will be slower modifying a structs properties compared to a class. But I do think that they will be faster to read from due to the struct being allocated on the stack instead of the heap. Reading data from the heap requires more checks than from the stack.

Robert
  • 1,129
  • 2
  • 12
  • 23
0

Well, if you go with struct afterall, then get rid of string and use fixed size char or byte buffer.

That's re: performance.

Daniel Mošmondor
  • 19,718
  • 12
  • 58
  • 99