14

The MSDN clearly states

For all other types, including structs, the sizeof operator can only be used in unsafe code blocks.

The C# Language Specification is even more precise :

  1. The order in which members are packed into a struct is unspecified.
  2. For alignment purposes, there may be unnamed padding at the beginning of a struct, within a struct, and at the end of the struct.
  3. The contents of the bits used as padding are indeterminate.
  4. When applied to an operand that has struct type, the result is the total number of bytes in a variable of that type, including any padding.

However how would the CLR handle the following structures :

[StructLayout(LayoutKind.Explicit, Size = 1, Pack = 1)]
public struct MyStruct
{
    [FieldOffset(0)] public byte aByte;
}

public struct MyEmptyStruct { }

In MyStruct we enforce the layout explicitly, the size and how to pack it via the StructLayout attribute. This structure is supposed to have a size of 1 byte in memory.

On the other hand MyEmptyStruct is empty, we can assume that the size in memory will be 0 bytes - even if a such structure is most likely not going to be used it still is an interesting case.

When trying to compute the size of theses structures using sizeof(MyStruct) and sizeof(MyEmptyStruct) the compiler throws the following error :

'*' does not have a predefined size, therefore sizeof can only be used in an unsafe context

I would like to know why using sizeof in this context is considered unsafe. The question is not intended to ask for workarounds nor the correct way to compute the size of a struct but rather to focus on the causes.

dna
  • 1,085
  • 7
  • 15
  • 1
    Skeet's answer here: [sizeof() structures not known. Why?](http://stackoverflow.com/questions/8048540/sizeof-structures-not-known-why) is good. – spender Jun 14 '13 at 09:40
  • I haven't seen anywhere the reason *why* it is `unsafe`. I'm guessing that the compiler requires that to reinforce the notion that `sizeof(struct)` is going to vary based on x86/x64 setting and so on, so it's kindof an unsafe thing to do. But just asking for the size of a struct isn't `unsafe` in the same way as getting and using a pointer to a memory block is `unsafe`. – Matthew Watson Jun 14 '13 at 09:44
  • Voting to reopen. This is **not** a duplicate of the question linked above - the other question asks why you cannot get a size of a a `struct` that consists of only the built-in types, not why `sizeof` of `struct`s is not available in managed contexts. – Sergey Kalinichenko Jun 14 '13 at 09:50
  • 2
    The reason is outlined in the first sentence of Chris Brummes blog entry: http://blogs.msdn.com/b/cbrumme/archive/2003/04/15/51326.aspx - "We don't expose the managed size of objects because we want to reserve the ability to change the way we lay these things out." – Lasse V. Karlsen Jun 14 '13 at 09:53
  • @spender It's a good answer, but it answers a subtly different question. The title of the question hides it, but the OP wanted to know only why he can't get a size of a struct that consists of primitives, not why it's prohibited in general. – Sergey Kalinichenko Jun 14 '13 at 09:54
  • 5
    I've come to the conclusion that the reason you require `unsafe` is that the *only reason* you could possibly have to take the size of a struct using `sizeof()` is if you are going to do pointer arithmetic, therefore it is sensible to restrict usage to an `unsafe` context. Note that you can't use `sizeof(struct)` when serializing data because it could be a different size from `Marshal.Sizeof()` – Matthew Watson Jun 14 '13 at 10:06
  • I think this limitation was introduced by the C# designers "for your own good". Their logic went like this: "you cannot make use of the value returned by `sizeof` unless you can access the memory of the `struct`, so you would be better off not knowing the `sizeof` in the first place". Although they are right, there is a simple workaround that lets you get to `sizeof` anyway: [link](http://stackoverflow.com/a/8189795/335858). – Sergey Kalinichenko Jun 14 '13 at 10:07
  • @spender Thank you for the link, still consider an empty struct or a struct fully decorated using `StructLayout` attribute. Do we still have no assurance that the struct won't have that precise layout in memory? Should I edit my question for more clarity? – dna Jun 14 '13 at 10:38
  • Doesn't `StructLayout` require unsafe code? In which case it becomes possible to use `sizeof` right? – spender Jun 14 '13 at 10:40
  • @dasblinkenlight Well if the causes behind that is the same that might also be subtly duplicate! I would like to thank you for your explanation and also for the nasty workaround. I wasn't aware of this one! – dna Jun 14 '13 at 10:41
  • @dna You are welcome! I hope this question gets re-opened soon (it has four votes out of five necessary to reopen) so that I could post this comment as a normal answer. – Sergey Kalinichenko Jun 14 '13 at 10:43
  • @spender I don't think that `StructLayout` requires unsafe code. What make you say that? Moreover I can compile without the unsafe switch. – dna Jun 14 '13 at 10:44
  • Oops. My bad. I probably need to read up more about this. – spender Jun 14 '13 at 11:03
  • I have edited my question for more clarity – dna Jun 14 '13 at 11:37
  • @dasblinkenlight Go ahead, the question is open again. –  Jun 14 '13 at 12:02
  • Note that the size of the struct in the bottom of the question is not 1, it's 2, a char in .NET is 16-bit. – Lasse V. Karlsen Jun 14 '13 at 12:04
  • @Lasse V. Karlsen Thank you for spotting the typo! – dna Jun 14 '13 at 12:05
  • @LasseV.Karlsen: I looked at the link suggesting that sizeof() is disallowed to allow for future changes. I'd suggest that the possibility of future changes would be a reason to *provide* a sizeof() functionality, or at least a way of knowing thing like "How many elements can an array of this type have while being eligible for gen0 collection". To be sure, having to guess that one should generate an arrays of size roughly 84999/sizeof(element), except when the element is too big to make that practical, is a hack, but... – supercat Jun 14 '13 at 15:02
  • @LasseV.Karlsen: ...having to do such calculations while *guessing* at the size of the elements is even worse. Of course, having a means of either *asking* what array size would avoid the LOH, and/or having a "create array in gen0 heap if possible" method might be better, but so far as I know those don't exist within the runtime or Framework, while the ability to ascertain an object size exists but is not exposed by many languages. – supercat Jun 14 '13 at 15:04
  • Oh, I agree. But all evidence suggests this is just a decision made by the compiler design team, more than a technical reason behind it. I would love to have that "how big can you make this before the LOH comes into play" calculation myself. – Lasse V. Karlsen Jun 14 '13 at 17:28
  • @LasseV.Karlsen: I wonder if there would be any particular difficulty with .NET offering methods `CreateShortLivedArray(int Size)`, `CreateLongLivedArray(int Size)`, perhaps with overloads that accept multiple sizes or versions that copy an existing array? Would anything "bad" happen if an array larger than 85,000 bytes got allocated on the Gen0 heap, or would such allocation simply be "expected" no not perform as well as an LOH application if there's e.g. a 25% chance references to the array might exist when the next gen2 collection rolls around? – supercat Jun 15 '13 at 20:45

2 Answers2

12

I would like to know why using sizeof in this context is considered unsafe.

Matthew Watson's comment hits the nail on the head. What are you going to do with that information in safe code? It's not useful for anything(*). It doesn't tell you how many unmanaged bytes you need to allocate to marshal; that's Marshal.SizeOf. It's only useful for pointer arithmetic, so why should it be in the safe subset?


(*) OK to be fair there are a few odd corner case usages for a safe sizeof that can take structs that contain managed types. Suppose for example you have a generic collection class that is going to allocate a bunch of arrays and would like to ensure that those arrays are not moved into the large object heap; if you could take the size of a struct that contained managed objects then you could write this code very easily, and it would not need any pointer arithmetic. But the fact remains that sizeof was designed specifically for pointer arithmetic, and not so that you could do an end-run around the garbage collection heuristics for arrays.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • 1
    The subject of your latest blog post is quite a coincidence. It must be `sizeof` awareness week or something ;) – Doctor Jones Jun 14 '13 at 15:20
  • 2
    One thing I've noticed in the design of C# is that it sometimes seems like the language people go out of their way to forbid things for which they don't see a use, even if the things would otherwise be harmless; examples include the aforementioned sizeof, `enum` and `delegate` type constraints, declaring a method `protected new sealed virtual` (a `new` non-virtual method will not prevent a derived class from overriding the parent definition), etc. Is there a driving philosophy not to allow things for which no immediate use is seen, even when doing so would simply mean not checking for them? – supercat Jun 14 '13 at 16:01
  • 1
    @supercat: That's an oversimplification but basically the answer to your question is yes. The design team has the attitude that features should be justified by their use cases and if possible, constrained to those cases. The design team also has the conflicting attitude that general features are better than specific features. Design is the process of finding a good compromise amongst a set of conflicting principles. – Eric Lippert Jun 14 '13 at 16:15
  • 1
    I can certainly understand that features that require significant work to implement should require significant justification. I'm more interested in cases in which the language designers decided that compiler writers should invest effort forbidding constructs for which they might not have seen much use, but which would otherwise have been available *by default*. For example, constraining a a generic type to `System.Delegate` wouldn't let one `Invoke` it, but would allow one to call `Delegate.Combine` on two things of that type and cast the result back to that type (useful), so why forbid it? – supercat Jun 14 '13 at 16:56
  • 1
    @supercat: There is no feature that does not require significant design, specification, implementation, test and documentation work, and there is no feature which does not impact the design cost of every future feature. The feature you suggest is a good one and I'd love to have it; you've already mentioned some of the cases that would have to be tested. It's not a bad feature and I'd use it if I had it -- and we considered it when adding expression trees -- but it didn't make the bar. C# 3.0 was the single largest work item in that release of VS; anything that added risk was cut. – Eric Lippert Jun 14 '13 at 17:22
  • @EricLippert: What work would have been required to simply omit `System.Enum`, `System.Delegate`, and `System.MulticastDelegate` from the list of forbidden constraint types? Constraints with those types are from what I understand allowed in the CIL spec, and in practice work as the spec would suggest when implemented in CIL, so why single out those types for exclusion? – supercat Jun 14 '13 at 17:30
  • @supercat: I don't know the details; I was not on the C# team at the time that decision was made, and the implementation team was at Microsoft Research Cambridge. You should address your question to Andrew Kennedy if you want a definitive answer from someone who was there for that decision. – Eric Lippert Jun 14 '13 at 17:42
  • Thank you very much for the interesting comments. I do have a final question. how the MSDN at http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.structlayoutattribute%28v=vs.90%29.aspx should be understood. Especially in the `Remarks` concerning the `Explicit` layout? Does it mean that the layout of the structure will actually be the same in the managed as in unmanaged world? If it does then can we say it is *safe* to rely on `sizeof` for an unmanged allocation in that particular case? – dna Jun 14 '13 at 18:31
  • @dna: I think the documentation is pretty clear regarding when the attributes affect layout in managed memory of blittable structs. As for your second question: why would you rely on the wrong tool for the job when the right tool is so easy to use? **If you're marshaling and you need to know the size then use the aptly-named `Marshal.SizeOf` method**. What's the benefit of doing anything else? – Eric Lippert Jun 14 '13 at 18:36
  • @EricLippert: I am just curious, that's all. However I don't find the ducumentation that clear : *This affects* might actually means a lot of thing. But again thank you for the answers! – dna Jun 14 '13 at 18:45
  • @supercat: I believe that I read somewhere that those types are omitted because what you really want is to constrain a type to a *strict* subtype. That is, it would often be nonsensical for the type argument to be `System.Enum` itself, etc. I happen to find the cure worse than the disease, but there is an underlying justification. – kvb Jun 21 '13 at 02:39
  • @kvb: If one constrained a type to both `struct` and `System.Enum`, I think the generic would only accept proper enum types. While one would need to use some Reflection to do anything useful, it's possible for a static `EnumMasker.HasFlag()` method [which could be called from e.g. a static `EnumMasker.HasFlag(this T it)` extension method] to achieve performance almost 10x as fast as `Enum.HasFlag`. Seems pretty useful. And as for `Delegate`, a method to combine delegates and return a properly-typed delegate would seem useful even if `T` was `Delegate`. – supercat Jun 21 '13 at 04:50
  • @supercat: I don't disagree; I just wanted to point out that there is some (perhaps not compelling) reason to prevent those types from being used as constraints. If you really want to use them, consider using F# :-) – kvb Jun 21 '13 at 13:52
  • @kvb: Personally, I suspect the problem is that constraints were viewed only as a means of allowing the class or method which uses a generic parameter to use it in a way that would otherwise not be possible, rather than as a means of ensuring that they would only be useful in cases which make sense. From that point of view, the obnoxious `struct` constraint on `Nullable` might be considered the only reason a `struct` constraint is needed at all (I can't think of anything it allows one to do except declare a `T?`) though there are cases where... – supercat Jun 21 '13 at 14:51
  • ...a `Foo` couldn't possibly be expected to behave usefully with a non-struct `T` [a class may expect that copying a `T` that implements a mutating interface (e.g. `IEnumerator`) will copy the state represented thereby]. I think it's much better to view language features in terms of expressiveness. From a direct code execution perspective, there may be no distinction between reference-type storage locations which are used only to encapsulate immutable state other than identity, versus those which also encapsulate identity, mutable state, or both, but... – supercat Jun 21 '13 at 15:03
  • ...if the type system included such notions, much of the confusion surrounding deep vs shallow operations (e.g. clone or equals), or the ownership of objects returned from methods or properties, would be alleviated. For example, after `myItems = Foo.Items; Foo.AddItem(...);`, will `myItems` contain the new item? A return type that indicated that it *encapsulates its target's mutable state* would imply a mutable snapshot; one that indicated that it *encapsulates the identity of its target* would imply a live view. – supercat Jun 21 '13 at 15:33
7

Lots of wrong assumptions in the question, I'll just address them one by one:

in MyStruct we enforce the layout explicitly

You didn't. The [StructLayout] attribute is only truly effective when the structure value is marshaled. Marshal.StructureToPtr(), also used by the pinvoke marshaller. Only then do you get the guarantee that the marshaled value has the requested layout. The CLR reserves the right to layout the structure as it sees fit. It will align structure members so the code that uses the struct is as fast as possible, inserting empty bytes if necessary. And if such padding bytes leave enough room then it will even swap members to get a smaller layout. This is entirely undiscoverable, other than by using a debugger to look at the machine code that accesses the structure members. Some [StructLayout] properties do affect the layout, LayoutKind.Explicit does in fact support declaring unions. The exact details of the mapping algorithm is undocumented, subject to change and strongly depends on the target machine architecture.

the result is the total number of bytes in a variable of that type, including any padding.

It is not, the actual structure can be smaller than the declared struct. Possible by swapping a member into the padding.

This structure is supposed to have a size of 1 byte in memory.

That's very rarely the case. Local variables are also aligned in memory, by 4 bytes on a 32-bit processor and 8 bytes in a 64-bit processor. Unless the struct is stored in an array, it will actually take 4 or 8 bytes on the stack or inside an object on the heap. This alignment is important for the same reason that member alignment is important.

MyEmptyStruct is empty, we can assume that the size in memory will be 0 bytes

A variable will always have at least 1 byte, even if the struct is empty. This avoids ambiguities like having a non-empty array that takes zero bytes. Also the rule in other languages, like C++.

why using sizeof in this context is considered unsafe

To be clear, using sizeof on primitive value types doesn't require unsafe since .NET 2. But for structs there is a definite possibility that sizeof() might be used to address memory directly, adding it to an IntPtr for example. With the considerable risk that using sizeof() was the wrong choice and should have been Marshal.SizeOf() instead. I would guess that the practicality of using sizeof() on structs is so low, given that a struct should always be small, and the odds for hacking IntPtrs the wrong way is so high that they left it unsafe.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • 1
    Thank you for the detailed answer, however I still have a couple of questions. `the result is the total number of bytes in a variable of that type, including any padding. => It is not ...` so the C# spec is actually wrong? `That's very rarely the case. Local variables are also aligned in memory` Sure but the sizeof operator isn't defined to return the aligned/padded size of types, the `Marshal.SizeOf()` is meant for that. – dna Jun 14 '13 at 13:18
  • 1
    Also `You didn't. The [StructLayout] attribute is only truly effective when the structure value is marshaled.` the MSDN states the contrary : `...LayoutKind.Explicit ... This affects both managed and unmanaged layout, for both blittable and non-blittable types` http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.structlayoutattribute(v=vs.110).aspx – dna Jun 14 '13 at 13:19
  • You are asking too many questions to answer. I specifically addressed the corner case of unions. – Hans Passant Jun 14 '13 at 13:24
  • Please accept my apologies if I have to many remarks regarding your answer, but I think that a couple of points need to be clarified. – dna Jun 14 '13 at 13:32
  • Code which causes an `IntPtr` to be actually dereferenced is going to be unverifiable. Adding `sizeof(someStruct)` isn't going to make it any more unverifiable than adding `12`. Code which adds `sizeof(someStruct)` to an `IntPtr` but never actually causes it to be dereferenced is going to be no more unsafe than code which adds any other number to any other variable. – supercat Jun 14 '13 at 16:05