11

Suppose we have two structs:

typedef struct Struct1
{
    short a_short;
    int id;
} Struct1;

typedef struct Struct2
{
    short a_short;
    int id;
    short another_short;
} Struct2;

Is it safe to cast from Struct2 * to Struct1 * ? What does the ANSI spec says about this? I know that some compilers have the option to reorder structs fields to optimize memory usage, which might render the two structs incompatible. Is there any way to be sure this code will be valid, regardless of the compiler flag?

Thank you!

Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105
Waneck
  • 2,450
  • 1
  • 19
  • 31
  • 4
    *Reordering* the members is not allowed by the standard AFAIK. I believe inserting different amounts of padding would be allowed though. –  Jan 02 '12 at 15:53
  • @delnan Oh so then that struct 'packing' will only disable alignment? Thanks, I didn't know that! – Waneck Jan 02 '12 at 16:00

6 Answers6

5

It is safe, as far as I know.

But it's far better, if possible, to do:

typedef struct {
    Struct1 struct1;
    short another_short;
} Struct2;

Then you've even told the compiler that Struct2 starts with an instance of Struct1, and since a pointer to a struct always points at its first member, you're safe to treat a Struct2 * as a Struct1 *.

unwind
  • 391,730
  • 64
  • 469
  • 606
  • well, if there is the slightest chance that one day `offsetof( Struct1.a_short )` will be found to NOT be equal to `offsetof( Struct2.a_short )` then there is an equal amount of chance that one day `offsetof( Struct2.struct1 )` will be found not equal to zero. (Which would mean that `&struct2 != (Struct2*)&struct2.struct1`). – Mike Nakis Jan 02 '12 at 15:56
  • Indeed, this way is much better! :) Thank you! – Waneck Jan 02 '12 at 15:56
  • 1
    If struct1 and struct2 both placed the "int" first and "int" required 32-bit alignment, then both structure types could 8 bytes, but your alternative form of Struct2 would require 12 bytes. If compilers honor the Common Initial Sequence rule, either form should be valid (and the 8-byte form would be more efficient), but even when invoked in C89 mode, gcc no longer upholds C89's guarantees except when the `-fno-strict-aliasing` flag is used. – supercat Apr 12 '17 at 22:17
4

No, the standard does't allow this; accessing the elements of a Struct2 object through a Struct1 pointer is undefined behavior. Struct1 and Struct2 are not compatible types (as defined in 6.2.7) and may be padded differently, and accessing them via the wrong pointer also violates aliasing rules.

The only way something like this is guaranteed to work is when Struct1 is included in Struct2 as its initial member (6.7.2.1.15 in the standard), as in unwind's answer.

dpi
  • 1,919
  • 17
  • 17
  • It is a good point. However, the language spec provides some fairly related course of reasoning when it talks about *trap representations*. It says that objects of struct types (as a whole) are never considered trap representations, even if their fields currently contain trap representations. This means that the language makes a clear distinction between accessing the whole struct object and accessing a particular field. – AnT stands with Russia Jun 13 '18 at 22:29
  • 1
    This makes one wonder if `((Struct1 *) struct2_ptr)->a_short` is really undefined. `*(Struct1 *) struct2_ptr` is definitely undefined (strict-aliasing violation). But whether `((Struct1 *) struct2_ptr)->a_short` is OK does not look clear to me from the current wording in the standard. – AnT stands with Russia Jun 13 '18 at 22:30
  • The standard is also unclear on whether `p->x` accesses `*p` or whether it only accesses the `x` , unfortunately this point is crucial to applying the strict aliasing rule. Common compilers seem to treat it as accessing `*p` for that purpose – M.M Jun 13 '18 at 22:31
  • Accesses to an aggregate member of non-character type fall in the category of behaviors which the authors of the Standard didn't mandate because they expected compilers would support them when practical. *Note that even straightforward access to an aggregate member using normal member-access sequence falls into this same category*. – supercat Jun 13 '18 at 22:34
  • @M.M: Actually, the Standard is pretty clear. The type of lvalue `p->x` is the type of the member. The Standard *should* recognize that in addition to being read or written, an lvalue may *also* be used to derive another lvalue, and recognize that operations on a "freshly derived" lvalue are operations on the parent, but it doesn't actually say any such thing, leaving such recognition as a Quality of Implementation issue in cases where it would lead to behaviors being defined that otherwise wouldn't be, but not allowing optimizations in cases where it would do the reverse. – supercat Jun 13 '18 at 22:39
  • @supercat: Again, there's no argument about the type of `p->x` and what `p->x` *ultimately* accesses. However, it is still not clear whether the process of evaluation of `p->x` involves a conceptual intermediate step that accesses to the whole `*p`. `p->x` is `(*p).x`. Is the `*p` part by itself already a strict-aliasing violation or not? – AnT stands with Russia Jun 13 '18 at 23:33
  • @AnT: Recognizing the act of deriving a pointer/lvalue of one type from one of anotherl, along with the concept of a pointer/lvalue being "freshly derived", would eliminate the need for the worse-than-useless "effective type rule" and "character-type exception", thus allowing more optimizations while eliminating 90% of the need for `-fno-strict-aliasing`. Unfortunately, the Standard doesn't recognize any such concept, and the authors of gcc and clang are too heavily invested on the idea that they should only need to recognize derivation when they feel like it, rather than when... – supercat Jun 14 '18 at 14:19
4

The language specification contains the following guarantee

6.5.2.3 Structure and union members
6 One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

This only applies to type-punning through unions. However, this essentially guarantees that the initial portions of these struct types will have identical memory layout, including padding.

The above does not necessarily allow one to do the same by casting unrelated pointer types. Doing so might constitute a violation of aliasing rules

6.5 Expressions
7 An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.

The only question here is whether accessing

((Struct1 *) struct2_ptr)->a_short

constitutes access to the whole Struct2 object (in which case it is a violation of 6.5/7 and it is undefined), or merely access to a short object (in which case it might be perfectly defined).

It general, it might be a good idea to stick to the following rule: type-punning is allowed through unions but not through pointers. Don't do it through pointers, even if you are dealing with two struct types with a common initial subsequence of members.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • Under those rules, even `struct s {int x;} foo; ... foo.x=1;` would invoke UB because the rules nowhere allow an aggregate to be accessed via member-type lvalue. Quality compilers should behave sensibly in such cases whether or not the Standard requires it, but the same is also true in cases where code sues a freshly-derived pointer to an union member. The fact that the Standard allows compilers to do silly things in such cases does not mean that compilers that do so should not be recognized being of inferior quality. – supercat Jun 13 '18 at 22:41
  • @supercat: That's not what I meant. Access to member `x` itself should definitely not be interpreted as access to the whole `foo`. That would make no sense. My concern is about the initial part of `foo.x` expression. Does that `foo` before the `.` by itself constitute access to the entire `foo` object? I.e. does the application of `.` operator to its left operand formally constitute access to the whole left operand? – AnT stands with Russia Jun 13 '18 at 22:54
  • If so, then `(*(Struct1 *) struct2_ptr).a_short` is already undefined because of the `(*(Struct1 *) struct2_ptr).` part alone. Even before we get to `a_short` part. – AnT stands with Russia Jun 13 '18 at 23:00
  • An access to an aggregate member using a pointer or lvalue freshly derived from the whole *is* an access to the whole and--in the case of a union--other members as well. With two slight changes, 6.5p7 would eliminate 90% of the need for `-fno-strict-aliasing` as well as the counterproductive "effective-type rule" and "character-type exception". Simply say that lvalues used from access must have the proper types, *or be freshly derived from others that do*, and limit the restruction to objects that are accessed during a particular execution of a function or loop (including nested ones). – supercat Jun 14 '18 at 14:23
  • If one coins the verbs to "read/write-address a byte" as forming a pointer or lvalue which will be used at any time in future, without laundering, to access or address a byte, then a pointer or lvalue L would remain fresh for purposes of accessing each byte until the byte is accessed or addressed without using a pointer or lvalue derived from L, or until code enters a function or loop wherein that occurs. Recognizing derivation in those cases is sufficiently easy I doubt the authors of C89 ever imagined that compiler writers would vehemently refuse to do so. – supercat Jun 14 '18 at 14:28
4

struct pointers types always have the same representation in C.

(C99, 6.2.5p27) "All pointers to structure types shall have the same representation and alignment requirements as each other."

And members in structure types are always in order in C.

(C99, 6.7.2.1p5) "a structure is a type consisting of a sequence of members, whose storage is allocated in an ordered sequence"

ouah
  • 142,963
  • 15
  • 272
  • 331
  • 3
    This does not answer the question; even with these constraints it could still be an aliasing violation. However, under certain conditions, the C standard does explicitly allow what OP wants. – R.. GitHub STOP HELPING ICE Jan 02 '12 at 15:57
  • Thank you very much for these quotes from the ANSI spec. This for me makes it clear that this is safe! – Waneck Jan 02 '12 at 15:58
  • @R.. This is a good point. If the casted pointer is dereferenced, this can still violate C aliasing rules. If the implementation take advantage of strict aliasing rules, this could be considered unsafe. – ouah Jan 02 '12 at 16:50
  • Even if we ignore strict-aliasing requirements, these quotes are *not* enough to make it safe. Where is the quote that guarantees identical amount of padding? – AnT stands with Russia Jun 13 '18 at 22:09
  • @AnT: It would be rather difficult for a compiler to uphold the Common Initial Sequence guarantees if the padding could differ. – supercat Jun 13 '18 at 22:32
  • @supercat: True. But I'm merely pointing out that the "common initial sequence" quote is not present in the answer. – AnT stands with Russia Jun 13 '18 at 22:34
  • @AnT: In other words, the answer's main point (the layout will be the same) happens to be correct, but the answer doesn't cite enough of the Standard to prove that? – supercat Jun 13 '18 at 22:43
  • @supercat: Yes, that's what I meant. However, I'm not a downvoter. My comment is not really worth that degree of scrutiny. – AnT stands with Russia Jun 13 '18 at 22:58
3

It will most probably work. But you are very correct in asking how you can be sure this code will be valid. So: somewhere in your program (at startup maybe) embed a bunch of ASSERT statements which make sure that offsetof( Struct1.a_short ) is equal to offsetof( Struct2.a_short ) etc. Besides, some programmer other than you might one day modify one of these structures but not the other, so better safe than sorry.

Mike Nakis
  • 56,297
  • 11
  • 110
  • 142
-1

Yes, it is ok to do that!

A sample program is as follows.

#include <stdio.h>

typedef struct Struct1
{
    short a_short;
    int id; 
} Struct1;

typedef struct Struct2
{
    short a_short;
    int id; 
    short another_short;
} Struct2;

int main(void) 
{

    Struct2 s2 = {1, 2, 3}; 
    Struct1 *ptr = &s2;
    void *vp = &s2;
    Struct1 *s1ptr = (Struct1 *)vp;

    printf("%d, %d \n", ptr->a_short, ptr->id);
    printf("%d, %d \n", s1ptr->a_short, s1ptr->id);

    return 0;
}
Sangeeth Saravanaraj
  • 16,027
  • 21
  • 69
  • 98