26

I want to store a 4-byte int in a char array... such that the first 4 locations of the char array are the 4 bytes of the int.

Then, I want to pull the int back out of the array...

Also, bonus points if someone can give me code for doing this in a loop... IE writing like 8 ints into a 32 byte array.

int har = 0x01010101;
char a[4];
int har2;

// write har into char such that:
// a[0] == 0x01, a[1] == 0x01, a[2] == 0x01, a[3] == 0x01 etc.....

// then, pull the bytes out of the array such that:
// har2 == har

Thanks guys!

EDIT: Assume int are 4 bytes...

EDIT2: Please don't care about endianness... I will be worrying about endianness. I just want different ways to acheive the above in C/C++. Thanks

EDIT3: If you can't tell, I'm trying to write a serialization class on the low level... so I'm looking for different strategies to serialize some common data types.

Rookie Programmer Aravind
  • 11,952
  • 23
  • 81
  • 114
DigitalZebra
  • 39,494
  • 39
  • 114
  • 146
  • 8
    Maybe you should do your own homework... And then, if you have any doubts, you can post your code here and we will try to help you then. If you don't try to do it yourself, you are not going to learn anything. – jpmelos Oct 06 '09 at 00:00
  • 1
    If you were writing C, you would know better than to initialize a variable with a value. – jkeys Oct 06 '09 at 00:57
  • 4
    ummmm what? The above is just to get the question across. – DigitalZebra Oct 06 '09 at 00:59
  • Are you only worried about ints, or do you need to do the same with non-POD types as well? – jalf Oct 06 '09 at 09:51
  • 1
    Actually I should only be dealing with POD types (I have a lot of terrain data that I'm sending across a network). Hopefully I won't be dealing with anything too complicated. – DigitalZebra Oct 06 '09 at 22:51

10 Answers10

42

Unless you care about byte order and such, memcpy will do the trick:

memcpy(a, &har, sizeof(har));
...
memcpy(&har2, a, sizeof(har2));

Of course, there's no guarantee that sizeof(int)==4 on any particular implementation (and there are real-world implementations for which this is in fact false).

Writing a loop should be trivial from here.

Pavel Minaev
  • 99,783
  • 25
  • 219
  • 289
24

Not the most optimal way, but is endian safe.


int har = 0x01010101;
char a[4];
a[0] = har & 0xff;
a[1] = (har>>8)  & 0xff;
a[2] = (har>>16) & 0xff;
a[3] = (har>>24) & 0xff;
9
#include <stdio.h>

int main(void) {
    char a[sizeof(int)];
    *((int *) a) = 0x01010101;
    printf("%d\n", *((int *) a));
    return 0;
}

Keep in mind:

A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.

Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
  • 4
    The pointer can be converted, but that doesn't mean that it can be dereferenced. E.g. you can convert `int*` to `float*` (no U.B.), but as soon as you try to write anything via that `float*`, you hit U.B. Your example is fine because writing via `char*` is specifically allowed for PODs, and lifetime of POD starts as soon as memory is allocated for it, but this is worth clarifying. – Pavel Minaev Oct 06 '09 at 00:10
  • 2
    Actually, sorry, I'm wrong, and this example is still U.B. - specifically, there's no guarantee that `a` is correctly aligned for `int`. There is a guarantee when allocating arrays with `new`, that they will be correctly aligned for any object of the same size as array; but there's no such guarantee for auto or static variables, or member fields. E.g. consider local variable declarations: `char c; char a[4];` - there's a good chance that `a` will not be allocated on a 4-byte boundary, and on some architectures this will result in a crash when you try to write into that location via an `int*`. – Pavel Minaev Oct 06 '09 at 00:12
  • Pavel, could you clarify what you mean by POD and U.B.? Thanks – DigitalZebra Oct 06 '09 at 00:12
  • POD = Plain Old Data type and UB = Undefined Behavior. – Sinan Ünür Oct 06 '09 at 00:14
  • Accessing any data type using a char pointer is fine. However, assuming data pointed to by a char pointer is correctly aligned for some other data type results in undefined behavior. Anything can happen. – Sinan Ünür Oct 06 '09 at 00:15
  • 2
    POD = Plain Old Data. U.B. = Undefined Behavior. The meanings of those two terms are precisely defined in ISO C++ specification. U.B. basically means "anything at all can happen, with no limits". POD means more or less "one of C++ primitive types like int or float, any pointer type, any enum type, array of any POD type, or any struct/classe/union consisting solely of fields of POD types, with no non-public members, no base classes, no explicit ctors or dtors, and no virtual members." – Pavel Minaev Oct 06 '09 at 00:15
  • It is safe to assume that pointer is correctly aligned if you allocate memory like this: `char* a = new char[sizeof(int)]`. The resulting block of memory is guaranteed to be aligned properly for any object that can fit into that block - including, obviously, an int. On a side note, it's worth looking at how much trickery `boost::optional` has to do to get the alignment right while avoiding heap allocation: http://www.boost.org/doc/libs/1_39_0/boost/optional/optional.hpp - have a look at `type_with_alignment` template... – Pavel Minaev Oct 06 '09 at 00:19
  • Thanks, I know what the terms mean I just wasn't sure on the acronyms :) – DigitalZebra Oct 06 '09 at 00:19
  • Are you sure dereferencing the casted pointer is UB? I'm pretty sure there's something about it just behaving as if it's pointing to 'an object with an unspecified value of type T'. Moreover, the note in 5.3.4:10 specifically mentions that char arrays are max aligned to allow "the common idiom of allocating character arrays into which objects of other types will later be placed". – jalf Oct 06 '09 at 10:18
  • @jalf I only have `n1124.pdf` (ISO/IEC 9899:TC2) and there is no section 5.3.4 in that document. I think you are referring to the C++ standard (deduced from http://www.boost.org/doc/libs/1_40_0/libs/pool/doc/implementation/alignment.html ). In any case, no I am not sure if the code above invokes UB although I cannot find anything in the C standard that guarantees that it does not – Sinan Ünür Oct 06 '09 at 14:49
  • @PavelMinaev *The pointer can be converted...* The next line in the Standard contradicts you statement: *If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.* Let me use your example of int and float. If they don't have the same alignment, according to the Standard, conversion causes UB. This is because, the first rule allows conversion, and the next rule restricts that. – 2501 Apr 20 '15 at 03:20
9

Note: Accessing a union through an element that wasn't the last one assigned to is undefined behavior. (assuming a platform where characters are 8bits and ints are 4 bytes) A bit mask of 0xFF will mask off one character so

char arr[4];
int a = 5;

arr[3] = a & 0xff;
arr[2] = (a & 0xff00) >>8;
arr[1] = (a & 0xff0000) >>16;
arr[0] = (a & 0xff000000)>>24;

would make arr[0] hold the most significant byte and arr[3] hold the least.

edit:Just so you understand the trick & is bit wise 'and' where as && is logical 'and'. Thanks to the comments about the forgotten shift.

stonemetal
  • 6,111
  • 23
  • 25
8

Don't use unions, Pavel clarifies:

It's U.B., because C++ prohibits accessing any union member other than the last one that was written to. In particular, the compiler is free to optimize away the assignment to int member out completely with the code above, since its value is not subsequently used (it only sees the subsequent read for the char[4] member, and has no obligation to provide any meaningful value there). In practice, g++ in particular is known for pulling such tricks, so this isn't just theory. On the other hand, using static_cast<void*> followed by static_cast<char*> is guaranteed to work.

– Pavel Minaev

GManNickG
  • 494,350
  • 52
  • 494
  • 543
  • 1
    It's U.B., because C++ prohibits accessing any union member other than the last one that was written to. In particular, the compiler is free to optimize away the assignment to `int` member out completely with the code above, since its value is not subsequently used (it only sees the subsequent read for the `char[4]` member, and has no obligation to provide any meaningful value there). In practice, g++ in particular is known for pulling such tricks, so this isn't just theory. On the other hand, using `static_cast` followed by `static_cast` is guaranteed to work. – Pavel Minaev Oct 06 '09 at 00:04
  • Thought so, I never clarified it, though. If you don't mind, I'll leave your comment as advice. – GManNickG Oct 06 '09 at 00:09
  • I don't mind, but it would be nice to fix those `static_cast`s :) – Pavel Minaev Oct 06 '09 at 00:16
8
int main() {
    typedef union foo {
        int x;
        char a[4];
    } foo;

    foo p;
    p.x = 0x01010101;
    printf("%x ", p.a[0]);
    printf("%x ", p.a[1]);
    printf("%x ", p.a[2]);
    printf("%x ", p.a[3]);

    return 0;
}

Bear in mind that the a[0] holds the LSB and a[3] holds the MSB, on a little endian machine.

Ashwin
  • 3,609
  • 2
  • 18
  • 11
  • Your comment about the LSB and MSB only holds true for little endian architectures. – 1800 INFORMATION Oct 06 '09 at 00:08
  • 5
    The read of `p.a` in this code invokes U.B., because it was not preceded by a write to `a`. Any conformant C++ implementation can legally optimize away the assignment to `p.x` completely, and some will do so. – Pavel Minaev Oct 06 '09 at 00:08
  • Umm, yes and no. The exact result is U.B., I guess, because it depends on platform architecture, but unions are the one legal way to alias different types and I would be quite surprised at a compiler that didn't totally understand that p.a had been written. In fact, unions are the *only* official way around type aliasing optimization in gnu implementations. – DigitalRoss Oct 06 '09 at 00:14
  • That's true, I guess, but unions are not the only way to solve this problem, and there are solutions that do not invoke UB so it is probably best to favour those. – 1800 INFORMATION Oct 06 '09 at 00:21
  • It is not legal to alias any two arbitrary types (unions or not - just don't do this, period), but it is perfectly legal to alias any POD type via a `char*`, and g++ supports that as well. The only caveat is that to be strictly conformant, you must `static_cast` to `char*` rather than `reinterpret_cast` or C-style cast (which means that you must first `static_cast` to `void*`) - though I haven't seen any implementation where that last bit actually makes any difference... – Pavel Minaev Oct 06 '09 at 00:21
  • Actually, just for the sake of completeness - it is legal to alias two POD structs in a union if they have a "common sequence" of fields (i.e. same types in same order) at the beginning, but then you can only alias those common fields... – Pavel Minaev Oct 06 '09 at 00:24
  • @DigitalRoss: Please look up what U.B. means. UB by definition does not "depend". If it depends on platform architecture then it is unspecified or implementation-specified, not UB. With UB, all bets are off, and, as Pavel says, the compiler could just optimize it away. I know GCC specifically allows the union trick, but that doesn't make it official. And it isn't the "only" way either. – jalf Oct 06 '09 at 09:54
4

You can also use placement new for this:

void foo (int i) {
  char * c = new (&i) char[sizeof(i)];
}
Richard Corden
  • 21,389
  • 8
  • 58
  • 85
2

    #include <stdint.h>

    int main(int argc, char* argv[]) {
        /* 8 ints in a loop */
        int i;
        int* intPtr
        int intArr[8] = {1, 2, 3, 4, 5, 6, 7, 8};
        char* charArr = malloc(32);

        for (i = 0; i < 8; i++) {
            intPtr = (int*) &(charArr[i * 4]);
          /*  ^            ^    ^        ^     */
          /* point at      |    |        |     */
          /*       cast as int* |        |     */
          /*               Address of    |     */
          /*            Location in char array */

            *intPtr = intArr[i]; /* write int at location pointed to */
        }

        /* Read ints out */
        for (i = 0; i < 8; i++) {
            intPtr = (int*) &(charArr[i * 4]);
            intArr[i] = *intPtr;
        }

        char* myArr = malloc(13);
        int myInt;
        uint8_t* p8;    /* unsigned 8-bit integer  */
        uint16_t* p16;  /* unsigned 16-bit integer */
        uint32_t* p32;  /* unsigned 32-bit integer */

        /* Using sizes other than 4-byte ints, */
        /* set all bits in myArr to 1          */
        p8 = (uint8_t*) &(myArr[0]);
        p16 = (uint16_t*) &(myArr[1]);
        p32 = (uint32_t*) &(myArr[5]);
        *p8 = 255;
        *p16 = 65535;
        *p32 = 4294967295;

        /* Get the values back out */
        p16 = (uint16_t*) &(myArr[1]);
        uint16_t my16 = *p16;

        /* Put the 16 bit int into a regular int */
        myInt = (int) my16;

    }

Schlameel
  • 411
  • 1
  • 4
  • 9
1
char a[10];
int i=9;

a=boost::lexical_cast<char>(i)

found this is the best way to convert char into int and vice-versa.

alternative to boost::lexical_cast is sprintf.

char temp[5];
temp[0]="h"
temp[1]="e"
temp[2]="l"
temp[3]="l"
temp[5]='\0'
sprintf(temp+4,%d",9)
cout<<temp;

output would be :hell9

pari
  • 185
  • 1
  • 1
  • 10
0
union value {
   int i;
   char bytes[sizof(int)];
};

value v;
v.i = 2;

char* bytes = v.bytes;
codie
  • 1
  • 1