14

I know that to get the number of bytes used by a variable type, you use sizeof(int) for instance. How do you get the value of the individual bytes used when you store a number with that variable type? (i.e. int x = 125.)

Tobey
  • 1,400
  • 1
  • 10
  • 25
  • Possible duplicate of [c get nth byte of integer](https://stackoverflow.com/questions/7787423/c-get-nth-byte-of-integer) – phuclv Jul 26 '18 at 07:35

5 Answers5

26

You have to know the number of bits (often 8) in each "byte". Then you can extract each byte in turn by ANDing the int with the appropriate mask. Imagine that an int is 32 bits, then to get 4 bytes out of the_int:

  int a = (the_int >> 24) & 0xff;  // high-order (leftmost) byte: bits 24-31
  int b = (the_int >> 16) & 0xff;  // next byte, counting from left: bits 16-23
  int c = (the_int >>  8) & 0xff;  // next byte, bits 8-15
  int d = the_int         & 0xff;  // low-order byte: bits 0-7

And there you have it: each byte is in the low-order 8 bits of a, b, c, and d.

Lindydancer
  • 25,428
  • 4
  • 49
  • 68
Pete Wilson
  • 8,610
  • 6
  • 39
  • 51
  • You're making it harder than it is and using shifting to extract bytes is rather CPU intensive. It would be much simpler and faster to access each byte location direct in memory... – DipSwitch Dec 30 '11 at 14:18
  • @Dip -- my brutish way has the virtue that it's portable across CPUs, whereas others may not be. But as you wish, of course. – Pete Wilson Dec 30 '11 at 14:31
  • 1
    +1: I consider this far better than fiddling with memory location. 1) if is endian-agnostic. 2) The CPU can perform operations in processor registers, so it is typically far more efficient. In fact, for some (8-bit) machines, this might not generate any code at all (if you have a decent compiler). – Lindydancer Dec 30 '11 at 14:40
  • Not entirely true, an int should be at leased 16 bites (this is true on AVR 8-bit platforms for example) so if you would allocate an int and read each byte with your code you would read 2 ints instead of one. Further more would per byte read always work since a byte read would always be aligned. – DipSwitch Dec 30 '11 at 14:40
  • 2
    It would be a better example if the type of `a`, `b` etc. would have been `unsigned char`. In that case, the compiler could allocate the variables in the same processor registers that holds `the_int`, if they are of the right size. – Lindydancer Dec 30 '11 at 14:53
  • 2
    +1 this answer is the most correct. All the others are endian dependent. @dip it's not processor intensive on many platforms, and anyway, the compiler will transform this into whatever *is* the best implementation behind the scenes. – ams Dec 30 '11 at 14:59
  • This is also platform dependent... and also not entirely true on an x86 platform for example it can only access a and b in the same register, c and d should be stored in another register since it can only access the high and low byte of the low word. This means a and b could accessed directly via registers AL and AH for example but then c and d can only be accessed if it copies register EAX to EBX and then shift is 16 bits right so it can access it from the registers BL and BH. – DipSwitch Dec 30 '11 at 15:05
  • @ams the compiler optimization also counts for union since this is exactly why an union in c exists in the first place. – DipSwitch Dec 30 '11 at 15:08
  • @Lindydancer -- Yes, I absolutely agree. *Except* that I'm not entirely sure that unsigned char must always be 8 bits long. That is, I don't know that uchar == byte in every case. OT: if you can do the Lindy Hop and remain upright at the end of the song, I'm going to upvote every answer of yours that I see just on that account :-) – Pete Wilson Dec 30 '11 at 15:11
  • @dip unions do not exist for type-punning, although they are useful for that. And yes, this specific example is target dependent in that it assumes a 32-bit int and 8-bit bytes, but it *is* endian independent, which none of the other solutions are (including unions). Note that this is target independent if you assume that only the bottom 32-bits of an int are interesting to the algorithm. – ams Dec 30 '11 at 15:16
  • 1
    @PeteWilson: I would say that the case where a `char` isn't 8 bits is so rare that you would have to hand-craft everything anyway. I guess that one could write the code using the `CHAR_BITS` macro to be on the safe side, but I don't think it's worth the effort. Thanks for your offer -- I do dance the Lindy Hop and I do (normally) remain upright (even though I don't do aerials), however I would prefer it if you would upvote my answers based on the content rather than on my performance on the dance floor ;) – Lindydancer Dec 30 '11 at 15:27
  • @Lindydancer - well, whatever you say but *I* say that any +rep is good rep :-) And, just to beat this dead horse into dead-horse-hamburger, uchar a is exactly the same in memory as uint a (right?). Yeah CHAR_BITS is a bit overboard and, besides, using CHAR_BITS would make the code so very **CPU-intensive** (tm) DipSwitch. – Pete Wilson Dec 30 '11 at 15:44
  • 1
    @ams so I see, something new learned and for Pete What I meant is that lot of CPU's shifts are done per bits and cost more cycles (just like multiplying, dividing, modulo and branching) than simple bit manipulations and read / writes etc. But you're right, it is the best way to do this I know now. Sorry for being a pain =) – DipSwitch Dec 30 '11 at 16:13
  • @DipSwitch -- pain? ***You? :-)*** – Pete Wilson Dec 30 '11 at 17:10
19

You can get the bytes by using some pointer arithmetic:

int x = 12578329; // 0xBFEE19
for (size_t i = 0; i < sizeof(x); ++i) {
  // Convert to unsigned char* because a char is 1 byte in size.
  // That is guaranteed by the standard.
  // Note that is it NOT required to be 8 bits in size.
  unsigned char byte = *((unsigned char *)&x + i);
  printf("Byte %d = %u\n", i, (unsigned)byte);
}

On my machine (Intel x86-64), the output is:

Byte 0 = 25  // 0x19
Byte 1 = 238 // 0xEE
Byte 2 = 191 // 0xBF
Byte 3 = 0 // 0x00
compie
  • 10,135
  • 15
  • 54
  • 78
  • 1
    How 4294967278 is a byte? Default `char` type is probably signed on your system and casting into `unsigned` produces large numbers. – Eser Aygün Dec 30 '11 at 14:15
  • Any reason to use manual pointer arithmetic instead of the much more readable array access? – Konrad Rudolph Dec 30 '11 at 14:19
  • @Konrad Rudolph because it was the first thing I came up with. –  Dec 30 '11 at 14:20
  • what happens if u use 'char' instead of 'unsigned char' – Tobey Dec 30 '11 at 14:38
  • 1
    @toby the signedness of `char` is implementation defined. This means that with some compilers the internal representation looks different. In this code you don't want the sign bit. If you use `char` instead of `unsigned char`, it could include the sign bit and the casting won't work anymore. –  Dec 30 '11 at 14:39
  • This is endian-dependent and inefficient, as the generated code would typically write out the content of `x` to memory, in order to read it back in using byte accesses. – Lindydancer Dec 30 '11 at 14:53
6

You could make use of a union but keep in mind that the byte ordering is processor dependent and is called Endianness http://en.wikipedia.org/wiki/Endianness

#include <stdio.h>
#include <stdint.h>

union my_int {
   int val;
   uint8_t bytes[sizeof(int)];
};

int main(int argc, char** argv) {
   union my_int mi;
   int idx;

   mi.val = 128;

   for (idx = 0; idx < sizeof(int); idx++)
        printf("byte %d = %hhu\n", idx, mi.bytes[idx]);

   return 0;
}
DipSwitch
  • 5,470
  • 2
  • 20
  • 24
4

If you want to get that information, say for:

int value = -278;

(I selected that value because it isn't very interesting for 125 - the least significant byte is 125 and the other bytes are all 0!)

You first need a pointer to that value:

int* pointer = &value;

You can now typecast that to a 'char' pointer which is only one byte, and get the individual bytes by indexing.

for (int i = 0; i < sizeof(value); i++) {
    char thisbyte = *( ((char*) pointer) + i );
    // do whatever processing you want.
}

Note that the order of bytes for ints and other data types depends on your system - look up 'big-endian' vs 'little-endian'.

Dan
  • 10,531
  • 2
  • 36
  • 55
  • Any reason to use manual pointer arithmetic instead of the much more readable array access? – Konrad Rudolph Dec 30 '11 at 14:19
  • Because it makes it obvious to those being educated that we're using a pointer? I don't consider pointer arithmetic to be wrong or unreadable. – Dan Dec 30 '11 at 14:21
  • 2
    It’s not wrong but surely you don’t argue that it’s as readable as array access …! Compare `*(x + i)` with `x[i]`. In fact, why does array access syntax exist at all, if we don’t use it when appropriate? – Konrad Rudolph Dec 30 '11 at 14:21
3

This should work:

int x = 125;
unsigned char *bytes = (unsigned char *) (&x);
unsigned char byte0 = bytes[0];
unsigned char byte1 = bytes[1];
...
unsigned char byteN = bytes[sizeof(int) - 1];

But be aware that the byte order of integers is platform dependent.

Eser Aygün
  • 7,794
  • 1
  • 20
  • 30