2

Let's say I have a void *buffer of a known size from some external source (it could be the fread() C API[1], for example, or a mmap call).

What types of pointers can I validly cast this void * to, and subsequently read from?

If I know this data is made up of 16-bit values, is it ever allowed, for example, to cast the void * to uint16_t * and just read the values directly by dereferencing the pointer?

I am aware that of course there are possible endianness issues, but is it even legal to do this in the first place (e.g., what about alignment)?

If it is legal to cast the whole buffer this way, what about a portion of the buffer? E.g., if I know the first 64 bytes are char * and then the next 10,000 bytes are uint16_t data?

[1] In the case of fread() assume the memory is allocated with malloc.

SODIMM
  • 303
  • 2
  • 12

2 Answers2

2

There's two possible issues:

  1. The cast may be subject to alignment restrictions.
  2. Reading or writing through the result of the cast is subject to the strict aliasing rule.

For part 1, it's implementation-defined whether a platform has alignment requirements. Consult the compiler documentation and it must say whether such restrictions exist. If they do, then it is undefined behaviour if the pointer you cast is not correctly aligned for the type pointed to by the target of the cast.

For part 2, you need to understand the strict aliasing rule. See this thread for a Standard quote plus various forms of introduction.

My answer from hereon in only refers to working in dynamically allocated space. A problem would occur if data was read and written via different types where the type doing the reading is not allowed to alias the type that did the writing:

uint16_t *buf = malloc(50);
((char *)buf)[0] = 'a';
((char *)buf)[1] = 'b';
*buf;  // undefined behaviour

So to answer your question, you need to know how the data was written.

In the case of fread, the standard (C11 7.21.8.1/2) specifies that it writes as if there were a series of assignments to unsigned char characters. So it would be undefined behaviour to fread into a malloc'd buffer and then read via a uint16_t expression.

The mmap function is not part of the C Standard. So the standard doesn't cover what would happen if you read out of mmap'd space before writing into it. But I would say that if you write into such space and then read from the same address, then the strict aliasing rule would apply.


Some compilers have switches or pragmas to "disable strict aliasing" , meaning that they will compile the code as if all aliasing was permitted. If you want to use coding techniques that violate the strict aliasing rule then it would be a good idea to use such switches for that code.

Community
  • 1
  • 1
M.M
  • 138,810
  • 21
  • 208
  • 365
  • Yeah there seems like a big hole there for all the "not C but heavily used from C" APIs like `mmap` and `read(2)` and friends. The rules about strict aliasing have to do with the type of the write, AFAIK, but it's weird to think that it would extend across process boundaries (indeed, the write may not have even occurred in C or on the current host, etc). – SODIMM Apr 30 '17 at 06:17
  • @SODIMM in practice the compiler would have to treat writes from unknown sources as if they might be some particular type, but it can assume it was still one type, e.g. you will still run into trouble if you read the same mmap'd bytes as both int and float because the compiler knows it can only have been written as one or the other at best – M.M Apr 30 '17 at 07:56
0

By "legal" - if you mean can do you do it, the answer is yes. Whether it works correctly depends on what you do.

If you are certain you are operating in the bounds of the memory space which is yours, you can cast void * to a uint16 * or anything else.

Such operation is frequently done in high speed code for video, compression and so forth.

If zero-copy speed is not needed, a safer way is to simply allocate that type on the stack, and then copy it in with a memcpy or assignment to fix alignment.

See these alignment macro patterns out of the Linux kernel, which basically do this (if the value is already aligned the compiler may optimize this out): align macro kernel

Community
  • 1
  • 1
EdH
  • 3,194
  • 3
  • 21
  • 23