Would C# benefit from aggregate structs/classes?

Question

Foreword

tl;wr: This is a discussion.

I am aware that this "question" is more of a discussion, hence why I will mark it as community wiki. However, according to the How to Ask page, it could belong here, as it is specifically programming related, not discussed anywhere on the web after an hour of research, specific, relevant to most C# programmers, and on-topic. Moreover, the question itself is meant to obtain an answer, for which I'd stay open-minded regardless of my bias: would C# really benefit from aggregate structs? Notwithstanding this foreword, I'd understand this to be closed, but would appreciate if the users with the authority and intention to close redirected me to an appropriate discussion spot on the Web.

Introduction

Lacks of struct mutability

Structs are flexible but debated types in C#. They offer the stack-allocated value type organizational paradigm, but not the immutability of other value types.

Some say structs should represent values, and values do not change (e.g. int i = 5;, 5 is immutable), while some perceive them as OOP layouts with subfields.

The debate on struct immutability (1, 2, 3) , for which the current solution seems to be having the programmer enforce immutability, is also unsolved.

For instance, the C# compiler will detect possible data loss when structs are accessed as a reference (bottom of this page) and restrict assignment. Moreover, since struct constructors, properties and functions are able to do whichever operation, with the limit (for constructors) of assigning all the fields before returning controls, structs cannot be declared as constant, which would be a correct declaration if they were limited to data representation.

Immutable subset of structs, aggregates

Aggregate classes (Wikipedia) are strict data structures with limited functionality, destined to offer syntactic sugar in counterpart for their lack of flexibility. In C++, they have "no user-declared constructors, no private or protected non-static data members, no base classes, and no virtual functions". The theoretical specifics of such classes in C# are herein open for debate, although the core concept remains the same.

Since aggregate structs are strictly data holders with labeled accessors, their immutability (in a possible C# context) would be insured. Aggregates also couldn't be nulled, unless the null operator (?) is specified, as for other pure value types. For this reason, many illegal struct operations would become possible, as well as some syntactic sugar.

Uses

Aggregates could be declared as const, since their constructor would be enforced to do nothing but assign the fields.
Aggregates could be used as default values for method parameters.
Aggregates could be implicitly Sequential, facilitating interaction with native
Aggregates would be immutable, enforcing no data loss for reference access. Compiler detection of such subfield modifications could lead to a complete, implicit reassignment.libraries.

Hypothetical Syntax

Taking from the C++ syntax, we could imagine something along the lines of: (Remember, this is a community wiki, improvement is welcome and encouraged)

aggregate Size
{
    int Width;
    int Height;
}

aggregate Vector
{
    // Default values for constructor.
    double X = 0, Y = 0, Z = 0;
}

aggregate Color
{
    byte R, G, B, A = 255;
}

aggregate Bar
{
    int X;
    Qux Qux;
}

aggregate Qux
{
    int X, Y;
}

static class Foo
{
    // Constant is possible.
    const Size Big = new Size(200, 100);

    // Inline constructor.
    const Vector Gravity = { 0, -9.8, 0 };

    // Default value / labeled parameter.
    const Color Fuschia = { 255, 0, 255 };
    const Vector Up = { y: 1 };

    // Sub-aggregate initialization
    const Bar Test = { 20, { 4, 3 } };

    static void SetVelocity(Vector velocity = { 0, 1, 0 }) { ... }
    static void SetGravity(Vector gravity = Foo.Gravity) { ... }

    static void Main()
    {
        Vector v = { 1, 2, 3 };

        double y = v.Y; // Valid.

        v.Y = 5; // Invalid, immutable.
    }
}

Implicit (re)Assignment

As of today, assigning a subfield of a struct in C# 4.0 is valid:

Vector v = new Vector(1, 2, 3);
v.Z = 5; // Legal in current C#.

However, sometimes, the compiler can detect when structs are mistakenly accessed as references, and will forbid changing subfields. For example, (example question)

//(in a Windows.Forms context)
control.Size.Width = 20; // Illegal in current C#.

As Size is a property and struct Size a value type, we would be editing a copy/clone of the actual property, which would be useless in such a case. As C# users, we tend to assume most things are accessed by reference, especially in OOP designs, which would make us think that such a call is legitimate (and it would be, if struct Size were a class).

Moreover, when accessing collections, the compiler also forbids us from modifying a struct subfield: (example question)

List<Vector> vectors = ... // Imagine populated data.
vectors[4].Y = 10; // Illegal in current C#.

The good news about these unfortunate restrictions is that the compiler does half of the possible aggregate solution for such cases: detect when they occur. The other half would be to implicitly reassign a new aggregate with the changed value.

When in local scope, simply reassign the vector.
When in external scope, locate a get, and if a matching set accessor is accessible, reassign to this one.

For this to be done and in order to avoid confusion, the delegate must be marked as implicit:

implicit aggregate Vector { ... }
implicit aggregate Size { ... }


// Example 1
{
    Vector v = new Vector(1, 2, 3);
    v.Z = 5; // Legal with implicit aggregates.

    // What is implicitly done:
    v = new Vector(v.X, v.Y, 5); // Local variable, simply reassign.
}

// Example 2
{
    //(in a Windows.Forms context)
    control.Size.Width = 20; // Legal with implicit aggregates.

    // What is implicitly done:
    Size old = control.Size.__get(); // External, MSIL detects a get.
    // If MSIL can find a matching, accessible __set:
    control.Size.__set({ 20, old.Height });
}

// Example 3
{
    List<Vector> vectors = ... // Imagine populated data.
    vectors[4].Y = 10; // Legal with implicit aggregates.

    // What is implicitly done:
    Vector old = vectors[4].__get(); // External, MSIL detects a get.
    // If MSIL can find a matching, accessible __set:
    vectors[4].__set({ old.X, 10, old.Z });
}

// Example 4
{
    Vector The5thVector(List<Vector> vectors) { return vectors[4]; }
    ...
    List<Vector> vectors = ...;
    The5thVector(vectors).Y = 10; // Illegal with implicit aggregates.

    // This is illegal because the compiler cannot find an implicit
    // "set" to match. as it is a function return, not a property or
    // indexer.
}

Of course, this last implicit reassignment is only a syntactic simplification which ~~could~~ or could not be adopted. I simply propose it as the compiler seems to be able to detect such reference access to structs and could easily convert the code for the programmer if it was an aggregate.

Summary

Aggregates can have fields;
Aggregates are value types;
Aggregates are immutable;
Aggregates are allocated on the stack;
Aggregates cannot inherit;
Aggregates have a sequential layout;
Aggregates have a sequential default constructor;
Aggregates cannot have user defined constructors;
Aggregates can have default values and labeled constructions;
Aggregates can be defined inline;
Aggregates can be declared as constant;
Aggregates can be used as default parameters;
Aggregates are non-nullable unless specified (?);

Possibly:

Aggregates (could) be implicitly reassigned; See Marcelo Cantos' reply and comment.
Aggregates (could) have interfaces;
Aggregates (could) have methods;

Cons

As aggregates wouldn't replace structs but rather be another organizational scheme, I cannot find many cons, but hope that the C# veterans of S/O will be able to populate this CW section. On a last note, please answer the question directly, as well as discussing it: would C# benefit for aggregate classes as described in this post? I am no C# expert in any way, but only an enthusiast of the C# language, and miss this feature which seems crucial to me. I'm seeking advice and comments from experienced programmers regarding this case. I am aware that there are numerous workarounds that exist and actively use them everyday, I simply think that they are too common to be ignored.

If I understand correctly, you want structs to have copy-on-write semantics? — leppie, Jan 18 '11 at 06:49
It was a suggestion among the features of a hypothetical aggregate class, but it should be disregarded, as Marcelo Cantos pointed out. — Lazlo, Jan 18 '11 at 06:56
Wiki, however, migrating to programmers might be a better option. — , Jan 18 '11 at 14:40
It would be nice to be able to define a mutable struct simply by listing its fields and having the compiler auto-generate a constructor for it. The only advantage I can see to having such a declaration produce anything other than an ordinary struct, though, would be the possibility of introducing covariance (so even if `KeyValuePair` were mutable POD, a `KeyValuePair` could be given to code expecting a `KeyValuePair`). Note that such covariance is always safe with a POD... — supercat, Feb 26 '12 at 18:27
...since passing an unboxed POD will always make a copy of the data, as will unboxing a POD. Passing around an boxed POD won't make a copy of the data, but since a boxed POD can't be mutated without unboxing first, there's no way anyone can get a mutable POD which isn't the exact type they're expecting (note that isn't strictly true of structs, since they can implement interfaces which are not covariant). Also, BTW, changing a field in an exposed struct by creating a temp struct instance with a different value in that field and then copying the temp struct over the original is silly. — supercat, Feb 26 '12 at 18:31

score 5 · Accepted Answer · answered Jan 18 '11 at 17:48

5

I wish that structs had been defined with something like your proposed semantics in the first place.

However, we're stuck with what we've got now and I think it is unlikely that we'll ever get a whole new "kind of type" into the CLR. Introducing a new kind of type means introducing it to every .NET language, not just C#, and that's a big change.

I think what is more likely -- and remember, when I talk about hypothetical language features for hypothetical, unannounced future products that don't exist and may never exist, I'm doing so for entertainment purposes only -- is that we'll find some way to make better immutability annotations and enforcements on both classes and structs. The compiler could do a better job of both enforcing immutability and making it easier to program in an immutable style, regardless of whether the type in question is a value type or a reference type. And the compiler or CLR could also potentially do a better job of optimizing code that works on multi core machines if it had more immutability guarantees known at compile time or jit time.

While you are noodling away at your proposal, an interesting question you might want to consider is: if aggregate types have methods, is "this" a value or a variable? For example:

aggregate Vector
{
    int x, y, z;
    public void M(Action action)
    {
         Console.WriteLine(this.x);
         action();
         Console.WriteLine(this.x);
    }
}
...
Vector v = new Vector(1, 2, 3);
Action action = ()=>{ v = new Vector(4, 5, 6); };
v.M(action);

What happens? Does "this" get passed to M by value, in which case it writes out "1" twice, or does it get passed as a reference to the variable, in which case your so-called "immutable" type is observed to mutate? (Because what is mutating is the variable; by definition variables are allowed to mutate, that's why they're called "vary-able".)

answered Jan 18 '11 at 17:48

Eric Lippert

647,829
179
1,238
2,067

This is arguable semantics, although 1 should be written twice, in my opinion, as the action reassigns a new value, a new immutable aggregate. "this" would then indeed be passed by value. – Lazlo Jan 18 '11 at 18:00
The more I read this answer, the more my brain fries. This should theoretically crash at runtime, shouldn't it? – Lazlo Jan 18 '11 at 18:10
@Lazlo: Why? Try it with an "immutable" struct. This code is bizarre, I agree, but it is perfectly legal. Why should it crash the runtime? *v* is a *variable* and therefore allowed to change. *this* is a reference to that variable, and therefore permitted to be observed to change. Can you explain why you think this should crash? – Eric Lippert Jan 18 '11 at 18:27
It wouldn't crash on an "immutable" struct, because such get accessed by reference on "this". But... Would the address, if reassigned, of the aggregate, be the same? In this case, I would agree to it returning 1 and then 4. – Lazlo Jan 18 '11 at 18:28
1

@Lazlo: Since these sorts of issues clearly interest you, you might want to read my articles on various quirks and implementation details of value types: http://blogs.msdn.com/b/ericlippert/archive/tags/value+types/ – Eric Lippert Jan 18 '11 at 18:29
What annoys me mostly is that whenever accessing a struct (value type), you get a copy of the value, unless you access it with "this". http://blogs.msdn.com/b/ericlippert/archive/2008/05/14/mutating-readonly-structs.aspx Care to elaborate on that a bit? – Lazlo Jan 18 '11 at 18:34
Moreover, as C++ does it, we could forbid aggregates from having methods (excepted static ones, of course). Somehow like you don't have "9.Sqrt()" storing "9 = 3". You can, however, have "Math.Sqrt(9)" which return 3. – Lazlo Jan 18 '11 at 18:43
@Lazlo: That's why value types are called "value types" -- because accessing them *copies them by value*. Reference types are called reference types because they are copied *by reference*. If you want to copy a value type by reference to its *storage location* then you can use "ref" or "out" to pass around a reference to the storage location that contains the value. (The "this" of a struct method is just an invisible "ref" parameter to the method.) – Eric Lippert Jan 18 '11 at 18:44
@Eric Lippert: in that case, the function would return 1 then 4. Reassigning it to v keeps the same address (correct me if that's wrong), and since "this" works as a ref to this address (now overwritten), the X value would then be 4 in the stack. Now, did that aggregate "mutate": no. It was completely overwritten. Only, you're accessing it with a pointer offset. – Lazlo Jan 18 '11 at 20:42
1

@Lazlo: C++ doesn't forbid methods on structs. – Marcelo Cantos Jan 19 '11 at 00:37
@Marcelo: No, but it does on aggregates. Read the Wikipedia article linked in the question. – Lazlo Jan 19 '11 at 00:57
1

@Lazlo: The only proscriptions on member function in aggregates are user-declared constructors and virtual member functions. Other than that, member functions are allowed in aggregates. – Marcelo Cantos Jan 19 '11 at 01:21

Marcelo Cantos · Answer 2 · 2011-01-18T06:53:13.913

2

What would this do?

List<Vector> vectors = ...;
Vector v = vectors[4];
v.Y = 10;

or this?

Vector The5thVector(List<Vector> vectors) { return vectors[4]; }
...
List<Vector> vectors = ...;
The5thVector(vectors).Y = 10;

Replacement of diagnostics with implicit assignment won't get you very far. There's a reason mutable structs are so problematic, and simply declaring a new concept, aggregates, won't fix any of these problems.

The best solution would have been to disallow mutable structs in the language in the first place. The second best solution is to behave as if they were disallowed. Structs are supposed to be small and self-contained, which eliminates any disadvantages to making them immutable.

edited Jan 18 '11 at 06:53

answered Jan 18 '11 at 06:45

Marcelo Cantos

181,030
38
327
365

In the first case, Example 1 should be applied. In the second case, the compiler should block it, as no reverse "set" can be implicitly determined. There is no reverse accessing that can be implicitly figured, anyway, since this is a second level reference (C# doesn't block this at the moment either). – Lazlo Jan 18 '11 at 07:14
OK, I can see now what your aiming at, and it does have a kind of logic to it, but I think the added confusion of implicit assignment massively outweighs the benefits. But that logic is based on the already tortured semantics of mutable structs, which is a bad idea, IMO. If you just left it at, "Aggregates are immutable structs with convenient initialisation syntax," I think you'd be onto something. – Marcelo Cantos Jan 18 '11 at 07:48
Possibly. Again, as in most programming scenarios, offering the option to the user is often the best solution. Consider implicit assignment blocked, unless the user defines the aggregate "public implicit aggregate ...", in which case he would have to know and understand the meaning and counterparts of such an operator to use it. – Lazlo Jan 18 '11 at 17:42
The best solution is to break countless projects? I find that hard to believe – AustinWBryan May 26 '18 at 06:24

score 1 · Answer 3 · answered May 03 '11 at 18:24

No, it would not benefit. Structs are better as mutable types anyway.

First of all... "Immutability with implicit reassignment" is really just "inefficient mutability".

Given a "Point" structure, if you intend to change only the value of X, why force a rewrite of the entire memory structure? Just overwriting X alone is more efficient than overwriting X with a new value and pointlessly overwriting Y with its current value. There would be no benefit to such a scheme.

Honestly, the whole topic of mutability is a matter of perspective. It really only makes sense to talk about mutability when referring to a complex object as a whole, and asking whether its individual pieces change value while maintaining references to the object as a whole.

For example, it makes sense to call a string immutable, because you refer to it as particular block of memory representing a collection of characters, in which the characters don't change value from the perspective of anything that has a reference to it. An int struct, on the other hand, is mutable, because it's value can be changed by a simple assignment, and any references (pointers) to the int struct will see those changes.

As for "this" in struct or aggregate methods, of course it should refer to the struct/aggregate's memory location on stack at all times, so updates via anonymous methods and delegates that change the struct's value, should be reflected and seen as mutable. To summarize, mutability is a good idea at a fundamental variable level, and immutability is best handled at a higher level where complex objects are represented and the "immutable" behavior is explicitly coded.

Hear hear. What's really needed is an efficient means of implementing struct properties with something akin to call-by-reference semantics. The way I'd like to see that implemented would be for a statement like "SomeList[5].X=9" to translate into something like "int temp = 9; SomeList.ActOnElement(5, (ref Point it, ref int param) => {it.X = param;}, temp)". If the operation to be performed on the struct is too complicated for a single int parameter, the compiler could generate a temporary struct and pass that instead. Some scenarios would require... — supercat, Nov 18 '11 at 23:35
...more than one ref parameter, so for this style to work optimally it would be necessary to have some means of specifying at least some special cases of variadic generic functions. Note that this approach would be better than simply having the property return a reference to the struct, since the property handler could take some action after the called routine had done whatever it wanted with the struct. For example, the accessor for Control.Bounds could move the control if any part of Bounds had been changed. — supercat, Nov 18 '11 at 23:38