4

I've lost count, long ago, of the number of times I've done something like this in C:

struct foo f;
struct foo* pf = &f;
char* pc = (char*) pf;
transmit(pc, sizeof(f));

Or perhaps:

char* buffer[1024];
receive(buffer, 1024);
float values[256];
for(int ii = 0; ii < 256; ii++) {
    float* pf = (float*)(buffer + ii*4);
    values[ii] = *pf;
}

Or maybe:

uint32_t ipAddress = ...;
uint8_t* p = (uint8_t*)&ipAddress;
uint8_t octets[4] = {p[0], p[1], p[2], p[3]};
printf("%d.%d.%d.%d\n", octets[0], octets[1], octets[2], octets[3]);

I've only just discovered that reinterpreting a piece of memory like this by casting to another pointer type invokes undefined behaviour. And yet all of the the above examples are meant to do are absolutely necessary. What's the right way of doing them?

Tom
  • 7,269
  • 1
  • 42
  • 69
  • 1
    may I interest you in the `union` keyword? – Christoph Jun 27 '13 at 17:27
  • Personally, I think using a union is a much worse solution to these problems than the typecasts. More code to write, creating more weird types, and you *still* need to cast all over the place. – Carl Norum Jun 27 '13 at 17:28
  • @CarlNorum: if your target type isn't `char`, just casting the pointers leads to UB; the effective typing rules basically make C into a strongly typed language where the type information is bound to the memory locations themselves; however, the type system is extremely unsound as the compiler will happily try to access memory via wrongly-typed expressions, but might actually assume that these invariants hold on higher optimization levels (eg in case of `strict-aliasing`) – Christoph Jun 27 '13 at 17:33
  • What is an example of "undefined" behavior from your examples? –  Jun 27 '13 at 17:36
  • @Christoph, I'm not sure I follow - using a union has exactly the same problems, doesn't it? And just casting the pointers doesn't cause UB - only if there's an alignment problem does that happen. You certainly can't *dereference* the pointer, but that's a different step - analogous to accessing a different union member than the one most recently stored to, which is also undefined behaviour. – Carl Norum Jun 27 '13 at 17:38
  • 2
    @CarlNorum: type punning through unions is perfectly well-defined (as long as the member you read from is shorter than the one you last wrote to, and you don't create trap representations); there's even a footnote in C99 that tells you so; however, it was incorrectly listed as UB in the Annex, which has been corrected with C11 – Christoph Jun 27 '13 at 17:43
  • @Bob: search for problems caused by `-fstrict-aliasing`, eg http://stackoverflow.com/questions/2958633/gcc-strict-aliasing-and-horror-stories – Christoph Jun 27 '13 at 17:43
  • @Christoph: I would expect odd behavior if I specified strict. I've never been clear on why someone would want to take away the benefit of using pointers so usefully. But then I don't write application code much either. –  Jun 27 '13 at 17:51
  • Right, I see footnote 95 now; thanks for the clarification! – Carl Norum Jun 27 '13 at 17:51
  • Ah, I had thought that using a union this way was undefined in C - didn't realise that was only in C++. The internet seems to be generally confused on this point. – Tom Jun 27 '13 at 17:54
  • @Bob: it's a trade-off; eg always using unions will take care of alignment, which can be important on systems other than x86 (sometimes even there - afaik access to `double` is only atomic if they are aligned); however, if alignment is not an issue, respecting effective typing can indeed be bothersome, eg if you implement a generic hash function; in my (totally unscientific tests) with SpookyHash, the necessary copying came with a performance penalty of ~10-20% (depending on the short or long form of the algorithm) – Christoph Jun 27 '13 at 17:59

1 Answers1

5

Casting to char * (or unsigned char * or typedefs thereof) is a special case, and does not cause undefined behaviour.

From the C spec, 6.3.2.3 Pointers, paragraph 7:

When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

Your first & third examples are covered by this case. The second example is a bit hinky, but will probably work on most systems. What you really should be doing is either reading directly into values:

float values[256];
receive(values, sizeof values); // assuming receive() takes a "void *" parameter

Or something like this (to avoid alignment problems):

char buffer[1024];
receive(buffer, sizeof buffer);
float values[256];
for(int i = 0; i < 256; i++)
{
    char *pf = (char *)&values[i];
    memcpy(pf, buffer + i * sizeof(float), sizeof(float));
}

(Note I changed buffer to be a char array - I assume that was a typo in your question).

Carl Norum
  • 219,201
  • 40
  • 422
  • 469
  • Can you provide a link? My K&R C book does not say as much... but then again it's not ISO or ANSI C. – Mgetz Jun 27 '13 at 16:57
  • 2
    Sure: [C11 (PDF link)](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf) or [C99 (PDF link)](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf), your pick. – Carl Norum Jun 27 '13 at 16:58
  • I'm new to C and was just wondering about `char *pf = (char *)&float[i];`. What exactly are you casting to `char *` here? (my compiler complains `parse error before 'float'` for that line..) – qwwqwwq Jun 27 '13 at 17:23
  • Whoops, typo. That should be `&values[i]`. Fixing. – Carl Norum Jun 27 '13 at 17:25
  • @qwwqwwq: It's a typo. Change `float` to `values`. – jxh Jun 27 '13 at 17:25