16

I have a unsigned char pointer which contains a structure.Now I want to do the following

unsigned char buffer[24];

//code to fill the buffer with the relevant information.

int len = ntohs((record_t*)buffer->len);

where record_t structure contains a field called len. I am not able to do so and am getting the error.

error: request for member ‘len’ in something not a structure or union.

Then I tried:

int len = ntohs(((record_t*)buffer)->len);

so as to get the operator precedence right. That gave me:

warning: dereferencing type-punned pointer will break strict-aliasing rules.

then I declared

record_t *rec = null;

rec = (record_t*)

what am I doing wrong here?

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
liv2hak
  • 14,472
  • 53
  • 157
  • 270

4 Answers4

21

According to the C and C++ standards, it is undefined behaviour to access a variable of a given type through a pointer to another type. Example:

int a;
float * p = (float*)&a;  // #1
float b = *p;            // #2

Here #2 causes undefined behaviour. The assignment at #1 is called "type punning". The term "aliasing" refers to the idea that several different pointer variables may be pointing at the same data -- in this case, p aliases the data a. Legal aliasing is a problem for optimization (which is one of the main reasons for Fortran's superior performance in certain situations), but what we have here is flat-out illegal aliasing.

Your situation is no different; you're accessing data at buffer through a pointer to a different type (i.e. a pointer that isn't unsigned char *). This is simply not allowed.

The upshot is: You should never have had data at buffer in the first place.

But how to solve it? Make sure you have a valid pointer! There is one exception to type punning, namely accessing data through a pointer to char, which is allowed. So we can write this:

record_t data;
record_t * p = &data;          // good pointer
char * buffer = (char*)&data;  // this is allowed!

return p->len;                 // access through correct pointer!

The crucial difference is that we store the real data in a variable of the correct type, and only after having allocated that variable do we treat the variable as an array of chars (which is allowed). The moral here is that the character array always comes second, and the real data type comes first.

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • +1 for not just telling OP what's wrong but showing *the right way* to do this! – R.. GitHub STOP HELPING ICE Oct 03 '11 at 03:21
  • No, no, you are exagerating and missing the point. Accessing an object through another type *may* be undefined behavior, if this results e.g in a trap representation or misalignment, but it mustn't necessarily. The point of aliasing is completely different. This is because the standard allows the compiler to assume that pointers of different type will not alias each other and that he may thus perform more agressif optimizations, which then can go wrong if in reallity these point to the same object. – Jens Gustedt Oct 03 '11 at 06:39
  • @JensGustedt: I don't think alignment plays a role in the rules. You can cast back and forth, that's OK (like `T * p = &x; S * q = (S*)p; T * r = (T*)q; T y = *r;`), but otherwise it's just undefined behaviour (which may behave as expected). *Aliasing* is a broad concept (e.g. consider the "vector adder" `add(float * a, float * b, float * c)` where it's useful to know if `a` or `b` alias `c`), but the "strict aliasing rule" says that pointers of different types can never alias one another. – Kerrek SB Oct 03 '11 at 10:16
5

You're getting that warning because you're breaking strict-aliasing by having two pointers of different types pointing to the same location.

One way to get around that is to use unions:

union{
    unsigned char buffer[24];
    record_t record_part;
};

//code to fill the buffer with the relavent information.

int len = ntohs(record_part.len);

EDIT:

Strictly speaking, this isn't much safer than your original code, but it doesn't violate strict-aliasing.

Mysticial
  • 464,885
  • 45
  • 335
  • 332
  • 1
    Accessing a union out of order is undefined behaviour. You're not "getting around" anything (but a direct route to the grave). – Kerrek SB Oct 03 '11 at 00:34
  • 1
    @Kerrek SB: Can you explain what you mean by the "out-of-order"? Yes I know that the union approach is also undefined, but that's the case with the original code as well. – Mysticial Oct 03 '11 at 00:37
  • It's only allowed to read the same union member that was last written to. Reading a different member would almost certainly be necessary for your solution, though. There's no way to weasel around invalid type punning; you really *have* to start with a variable of the correct type (see my answer). – Kerrek SB Oct 03 '11 at 00:40
  • 1
    @Mystical, it should probably be: int len = ntohs(record_part.len); – Jim Rhodes Oct 03 '11 at 00:41
  • @Kerrek SB: Ah that's what you meant by "out-of-order". I know what it means, just not by that name. – Mysticial Oct 03 '11 at 00:42
  • @Jim Rhodes: Good catch. Corrected it. I was referencing it from the first version of the code which didn't have the `()`. – Mysticial Oct 03 '11 at 00:43
  • It's still wrong. `noths` doesn't take a `short *`, it takes a `short`. The `&` is wrong. – Chris Lutz Oct 03 '11 at 01:05
  • Well, from the language point of view, sing a union in this way is also a violation of strict aliasing. Unions can't be used for memory reinterpretation. But from GCC point of view it is legal: GCC preserves and supports this use of unions specifically for situations when you want to break language strict-aliasing rules. – AnT stands with Russia Oct 03 '11 at 01:15
  • @Kerrek SB: That's not exactly true. One can start with an `unsigned char[]` buffer and then build a valid `record_t` object by using `memcpy` (see Jim Rhodes answer). This approach is explicitly allowed by the language, as long as the original buffer was obtained by a `memcpy` from an object of `record_t` type. That's actually the right way to do it. Not unions. – AnT stands with Russia Oct 03 '11 at 01:19
  • 6
    @AndreyT - C99 TC3 makes it legal to perform type-punning with unions. (I can't cite it, but check http://stackoverflow.com/questions/7411233/can-we-use-va-arg-with-unions for some people who agree with me.) – Chris Lutz Oct 03 '11 at 01:22
  • @AndreyT: I don't think we're contradicting each other. The `memcpy` approach requires that you already have a variable of the correct type lying around somewhere into which you copy the data. That's fine. All I'm saying is that you can't get around declaring a variable of the desired type at some point. – Kerrek SB Oct 03 '11 at 10:25
  • @ChrisLutz: Does this only apply if one of the union members involved in the read/write is a `char` array, or can it be any type? – Kerrek SB Oct 03 '11 at 10:26
  • @Kerrek SB - I believe it's any type (everyone else used the phrase "arbitrary type-punning") but that the resulting values are undefined (as they'd essentially have to be). I'd have to read TC3 to be sure though. – Chris Lutz Oct 03 '11 at 16:49
3

You might try this:

unsigned char buffer[sizeof(record_t)];
record_t rec;
int len;

// code to fill in buffer goes here...

memcpy(&rec, buffer, sizeof(rec));
len = ntohs(rec.len);
Jim Rhodes
  • 5,021
  • 4
  • 25
  • 38
  • This is actually the correct way to do it (casts to `void *` are unnecessary though). The proper way to reinterpret data of one type as data of another type is to copy that data between objects by using `memcpy` (instead of direct pointer-based reinterpretation or union-based reinterpretation). – AnT stands with Russia Oct 03 '11 at 01:17
  • 1
    I agree this is correct (and I gave it a +1), but it would be better never to use `unsigned char[]` to begin with and read the data directly into a variable of the right type. – R.. GitHub STOP HELPING ICE Oct 03 '11 at 03:24
1

You probably have a warning level set that includes strict aliasing warnings (it used to not be default, but at one point gcc flipped the default). try -Wno-strict-aliasing or -fno-strict-aliasing -- then gcc should not generate the warnings

A reasonably good explanation (based on cursory glance) is What is the strict aliasing rule?

Community
  • 1
  • 1
Foo Bah
  • 25,660
  • 5
  • 55
  • 79
  • 2
    Warning or not, the OP's code is possibly undefined behaviour, and that shouldn't be ignored. – Kerrek SB Oct 03 '11 at 00:34
  • I'm debating whether to -1 this. Giving OP advice on how to disable warnings for dangerous UB and add options to let the code with UB work "as expected" seems non-constructive. Especially since the code also has other UB due to **alignment** issues which are not being caught and will not be solved by the options. It will happily work on x86 and then crash later on other archs... – R.. GitHub STOP HELPING ICE Oct 03 '11 at 03:23
  • @R. There are many idioms which explicitly use this behavior (especially in networking code), and for a long time gcc disabled the warning by default specifically so that networking code wouldn't yield a slew of warnings :) Given that we are talking about networking code (assumed based on the `ntohs` call), it makes sense to at least mention this fact. – Foo Bah Oct 03 '11 at 04:01
  • As I've pointed out, such code is almost surely wrong for reasons gcc cannot work around: misalignment. Reading into a char buffer then trying to interpret that buffer as another type is just fundamentally wrong, and it shows that the author of the code does not understand how to pass a void pointer to the actual object of the proper type to `read`/`recv` (or in the case of larger buffering, how to use `memcpy`...) – R.. GitHub STOP HELPING ICE Oct 03 '11 at 04:46
  • @FooBah: Are you sure that the code you saw isn't actually doing something different? For example, there are many similar-looking idioms that are in fact perfectly legal. For example, accessing common initial elements of different structs is fine. Writing to a character array is also fine if that array was cast from an existing variable of the desired type... the devil is very much in the details with these things. – Kerrek SB Oct 03 '11 at 10:22
  • @KerrekSB: The authors of the C89 may have intended that accessing common initial elements of structures is fine, but C99 added additional restrictions which gcc interprets in such a way as to make the Common Initial Sequence rule useless. – supercat Sep 09 '16 at 15:32