10

I could not fully understand the consequences of what I read here: Casting an int pointer to a char ptr and vice versa

In short, would this work?

set4Bytes(unsigned char* buffer) {
  const uint32_t MASK = 0xffffffff;
  if ((uintmax_t)buffer % 4) {//misaligned
     for (int i = 0; i < 4; i++) {
       buffer[i] = 0xff;
     } 
  } else {//4-byte alignment
    *((uint32_t*) buffer) = MASK;
  }

}

Edit
There was a long discussion (it was in the comments, which mysteriously got deleted) about what type the pointer should be casted to in order to check the alignment. The subject is now addressed here.

Community
  • 1
  • 1
Antonio
  • 19,451
  • 13
  • 99
  • 197

6 Answers6

11

This conversion is safe if you are filling same value in all 4 bytes. If byte order matters then this conversion is not safe. Because when you use integer to fill 4 Bytes at a time it will fill 4 Bytes but order depends on the endianness.

Hemant Gangwar
  • 2,172
  • 15
  • 27
2

No, it won't work in every case. Aside from endianness, which may or may not be an issue, you assume that the alignment of uint32_t is 4. But this quantity is implementation-defined (C11 Draft N1570 Section 6.2.8). You can use the _Alignof operator to get the alignment in a portable way.

Second, the effective type (ibid. Sec. 6.5) of the location pointed to by buffer may not be compatible to uint32_t (e.g. if buffer points to an unsigned char array). In that case you break strict aliasing rules once you try reading through the array itself or through a pointer of different type.

Assuming that the pointer actually points to an array of unsigned char, the following code will work

typedef union { unsigned char chr[sizeof(uint32_t)]; uint32_t u32; } conv_t;

void set4Bytes(unsigned char* buffer) {
  const uint32_t MASK = 0xffffffffU;
  if ((uintptr_t)buffer % _Alignof(uint32_t)) {// misaligned
    for (size_t i = 0; i < sizeof(uint32_t); i++) {
      buffer[i] = 0xffU;
    } 
  } else { // correct alignment
    conv_t *cnv = (conv_t *) buffer; 
    cnv->u32 = MASK;
  }
}
Roland W
  • 1,401
  • 14
  • 21
  • Very interesting finding. Basically alignment is decoupled from sizeof... For C++98 so there's no portable solution, correct? – Antonio Sep 30 '14 at 14:24
  • I guess so, unless there is another way to portably get type alignments. Anyhow, the guess that object alignment is equal to object size seems to hold for most common architectures. – Roland W Sep 30 '14 at 14:32
1

This code might be of help to you. It shows a 32-bit number being built by assigning its contents a byte at a time, forcing misalignment. It compiles and works on my machine.

#include<stdint.h>
#include<stdio.h>
#include<inttypes.h>
#include<stdlib.h>

int main () {
    uint32_t *data = (uint32_t*)malloc(sizeof(uint32_t)*2);
    char *buf = (char*)data;
    uintptr_t addr = (uintptr_t)buf;
    int i,j;
    i = !(addr%4) ? 1 : 0;
    uint32_t x = (1<<6)-1;
    for( j=0;j<4;j++ ) buf[i+j] = ((char*)&x)[j];

    printf("%" PRIu32 "\n",*((uint32_t*) (addr+i)) );
}

As mentioned by @Learner, endianness must be obeyed. The code above is not portable and would break on a big endian machine.

Note that my compiler throws the error "cast from ‘char*’ to ‘unsigned int’ loses precision [-fpermissive]" when trying to cast a char* to an unsigned int, as done in the original post. This post explains that uintptr_t should be used instead.

Community
  • 1
  • 1
plafratt
  • 738
  • 1
  • 8
  • 14
  • 2
    This causes undefined behaviour by violating the strict aliasing rule. An object of type `char` may not have its value accessed by an lvalue of type `uint32_t`. In C++11 the relevant section is 3.10/10. – M.M Sep 25 '14 at 22:38
1

In addition to the endian issue, which has already been mentioned here:

CHAR_BIT - the number of bits per char - should also be considered.

It is 8 on most platforms, where for (int i=0; i<4; i++) should work fine.

A safer way of doing it would be for (int i=0; i<sizeof(uint32_t); i++).

Alternatively, you can include <limits.h> and use for (int i=0; i<32/CHAR_BIT; i++).

barak manos
  • 29,648
  • 10
  • 62
  • 114
  • Wow! Number of bits per char/byte can be != 8, that's scary! I had thought to put sizeof instead of 4, but I really thought that size of uint32_t had to be 4 in any case! – Antonio Oct 02 '14 at 12:40
  • @Antonio: the only `sizeof` that is guaranteed by the standard is `sizeof(char)`, which is guaranteed to be 1. If `CHAR_BIT` is 16, then `sizeof(uint32_t)` would be 2. – barak manos Oct 02 '14 at 12:42
  • Yes, thanks, now it's clear! (But still scary :) ). It would also change the way the other cycle is implemented, and the value I would have to put in each "byte". I think if we switch to a system where bytes have not 8 bits, we'll have much bigger problem than this :) – Antonio Oct 02 '14 at 12:43
0

Use reinterpret_cast<>() if you want to ensure the underlying data does not "change shape".

As Learner has mentioned, when you store data in machine memory endianess becomes a factor. If you know how the data is stored correctly in memory (correct endianess) and you are specifically testing its layout as an alternate representation, then you would want to use reinterpret_cast<>() to test that memory, as a specific type, without modifying the original storage.

Below, I've modified your example to use reinterpret_cast<>():

void set4Bytes(unsigned char* buffer) {
  const uint32_t MASK = 0xffffffff;
  if (*reinterpret_cast<unsigned int *>(buffer) % 4) {//misaligned
     for (int i = 0; i < 4; i++) {
       buffer[i] = 0xff;
     } 
  } else {//4-byte alignment
    *reinterpret_cast<unsigned int *>(buffer) = MASK;
  }
}

It should also be noted, your function appears to set the buffer (32-bytes of contiguous memory) to 0xFFFFFFFF, regardless of which branch it takes.

Zak
  • 12,213
  • 21
  • 59
  • 105
  • 1
    Uhm, I think you have misinterpreted the point of the if statement, which is checking if the pointer is 4-bytes aligned (if I understand correctly, you are checking the content of the buffer) – Antonio Sep 25 '14 at 20:11
  • @Antonio The original code has changed from an `unsigned int` to `uintptr_t`; so not exactly. Your new code uses a static casting (C syntax) from pointer to pointer, so now you are doing the same thing as my code only using different syntax. – Zak Sep 25 '14 at 22:40
  • @Antonio It is true I don't understand what you are trying to achieve, because both sides of the if statement change the value *at* `buffer` and not the value *of* `buffer`? If you called `set4bytes()` with the result of `set4bytes()`, would it not take the same branch? – Zak Sep 25 '14 at 22:44
  • The question is marked as C as well as C++ and there is no reinterpret_cast in C – CashCow Sep 30 '14 at 16:33
  • @CashCow In C, all cast are functionally equivalent to either `static_cast<>()` or `reinterpret_cast<>()`. The equivalent C syntax would be `*((unsigned int *)buffer)` – Zak Oct 02 '14 at 23:01
-1

Your code is perfect for working with any architecture with 32bit and up. There is no issue with byte ordering since all your source bytes are 0xFF.

At x86 or x64 machines, the extra work necessary to deal with eventually unaligned access to RAM are managed by the CPU and transparent to the programmer (since Pentium II), with some performance cost at each access. So, if you are just setting the first four bytes of a buffer a few times, you are good to simplify your function:

void set4Bytes(unsigned char* buffer) {
  const uint32_t MASK = 0xffffffff;
  *((uint32_t *)buffer) = MASK;
}

Some readings:

  1. A Linux kernel doc about UNALIGNED MEMORY ACCESSES
  2. Intel Architecture Optimization Manual, section 3.4
  3. Windows Data Alignment on IPF, x86, and x64
  4. A Practical 'Aligned vs. unaligned memory access', by Alexander Sandler