3

I am working with software embedded in minimal hardware that only supports ANSI C and has minimal versions of the standard IO libraries.

I have an Int variable, two bytes in size, but I need to divide it into 2 bytes separately to be able to transmit it, and then I can, reading the two bytes, reassemble the original Int.

I can think of some binary division of each byte like this:

int valor = 522;  // 0000 0010 0000 1010 (entero de 2 bytes)
byte superior = byteSuperior(valor);  // 0000 0010
byte inferior = byteInferioror(valor);  // 0000 1010
...
int valorRestaurado = bytesToInteger(superior, inferior); // 522

but I do not succeed in a simple way of dividing the whole by its weight and it gives me the feeling that it should be trivial (such as with bit shifting) and I do not discover it.

Actually, any solution that divides the whole into 2 bytes and reassembles it serves me well.

From already thank you very much!

Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105
Curious
  • 111
  • 1
  • 1
  • 10
  • 2
    Are you sending the data from one system to another that has different [endianness](https://en.wikipedia.org/wiki/Endianness)? If so, you can use [`htons()` and `ntohs()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/htons.html) if they're available on your systems. You use `htons()` to convert a two-byte `int` value to network byte order, then on receiving it you use `ntohs()` to convert it back to the host byte order for the host you received it on. – Andrew Henle Jul 27 '18 at 10:34
  • Don't let the long and complicated answers scare you. This is mostly a straightforward problem, and it is straightforward to write decent code that will work on your machine 100% of the time. Pick one of the answers involving `>>` and `& 0xff`, test it carefully as [my answer](https://stackoverflow.com/questions/51555676/how-to-divide-an-int-into-two-bytes-in-c/51558357#51558357) suggests, and you should be fine. – Steve Summit Jul 27 '18 at 13:04
  • @SteveSummit with emphasis on "*on your machine*" ;) Yes, it's kind of surprising something that sounds so simple is such a complex problem in C. I think it's good all these answers are there, you should know about these things, even if only for picking the implementation-defined method best for your (only) target system :) –  Jul 27 '18 at 15:27
  • @FelixPalmen I used to be one of the most portability-obsessed programmers I knew, but I guess I'm getting soft in my old age. And of course, there are good reasons why signeds and unsigneds of the same size tend to quietly and correctly interconvert on all the popular CPUs out there (and why signed integer overflow tends to wrap around as predictably as unsigned does). – Steve Summit Jul 27 '18 at 15:38

7 Answers7

7

This isn't a "simple" task.

First of all, the data type for a byte in C is char. You probably want unsigned char here, as char can be either signed or unsigned, it's implementation-defined.

int is a signed type, which makes right-shifting it implementation-defined as well. As far as C is concerned, int must have at least 16 bits (which would be 2 bytes if char has 8 bits), but can have more. But as your question is written, you already know that int on your platform has 16 bits. Using this knowledge in your implementation means your code is specific to that platform and not portable.

As I see it, you have two options:

  1. You can work on the value of your int using masking and bit-shifting, something like:

    int foo = 42;
    unsigned char lsb = (unsigned)foo & 0xff; // mask the lower 8 bits
    unsigned char msb = (unsigned)foo >> 8;   // shift the higher 8 bits
    

    This has the advantage that you're independent of the layout of your int in memory. For reconstruction, do something like:

    int rec = (int)(((unsigned)msb << 8) | lsb );
    

    Note casting msb to unsigned here is necessary, as otherwise, it would be promoted to int (int can represent all values of an unsigned char), which could overflow when shifting by 8 places. As you already stated your int has "two bytes", this would be very likely in your case.

    The final cast to int is implementation-defined as well, but will work on your "typical" platform with 16bit int in 2's complement, if the compiler doesn't do something "strange". By checking first whether the unsigned is too large for an int (because the original int was negative), you could avoid this, e.g.

    unsigned tmp = ((unsigned)msb << 8 ) | lsb;
    int rec;
    if (tmp > INT_MAX)
    {
        tmp = ~tmp + 1; // 2's complement
        if (tmp > INT_MAX)
        {
            // only possible when implementation uses 2's complement
            // representation, and then only for INT_MIN
            rec = INT_MIN;
        }
        else
        {
            rec = tmp;
            rec = -rec;
        }
    }
    else
    {
        rec = tmp;
    }
    

    The 2's complement is fine here, because the rules for converting a negative int to unsigned are explicitly stated in the C standard.

  2. You can use the representation in memory, like:

    int foo = 42;
    unsigned char *rep = (unsigned char *)&foo;
    unsigned char first = rep[0];
    unsigned char second = rep[1];
    

    But beware whether first will be the MSB or LSB depends on the endianness used on your machine. Also, if your int contains padding bits (extremely unlikely in practice, but allowed by the C standard), you will read them as well. For reconstruction, do something like:

    int rec;
    unsigned char *recrep = (unsigned char *)&rec;
    recrep[0] = first;
    recrep[1] = second;
    
  • 2
    I'm confident this is the completest answer so far, so anyone downvoting might want to explain their doubts... –  Jul 27 '18 at 11:17
  • @EricPostpischil sure it does, it even did **before** you wrote your first comment. Anyways, a "partial" answer would still give no reason for downvotes. –  Jul 27 '18 at 11:25
  • Sorry, I do not know how I missed the reconstruction. However, the reconstruction in part 1 is incorrect, as the cast from `unsigned` to `int` is implementation-defined when the value cannot be represented in `int`. – Eric Postpischil Jul 27 '18 at 11:28
  • For all practical purposes, C is Turing complete, and the implementation-defined behavior of conversions can be worked around. Even if it could not be worked around, it could be documented in the answer. – Eric Postpischil Jul 27 '18 at 11:30
  • 1
    @EricPostpischil it can be worked around, but not when working on the value bits. And I mentioned "implementation defined" clearly in the introduction, I don't see a reason to sprinkle it around everywhere. –  Jul 27 '18 at 11:31
  • The introduction says the signedness of `char` is implementation-defined. That is not a statement that the code in the answer relies on implementation-defined behavior in the conversion from `unsigned` to `int`. – Eric Postpischil Jul 27 '18 at 11:33
  • 1
    @EricPostpischil the introduction says more about non-portable code. And this behavior really discourages people from writing any more answers. Anyways, I even added a full explanation here. –  Jul 27 '18 at 12:08
  • `rec = ~tmp + 1; ` is not representable in `int` when `tmp` is `INT_MIN`. An example of a working conversion is `int foo(unsigned x) { if (x <= INT_MAX) return x; else if (x == INT_MIN) return INT_MIN; else return - (int) -x; }`. Apple LLVM 9.1.0 (clang-902.0.39.2) completely optimizes this. It even completely optimizes `int foo(unsigned x) { if (x <= INT_MAX) return x; else { int y = 0; while (x++) --y; return y; } }`, although I would not recommend that as other compilers might not. – Eric Postpischil Jul 27 '18 at 12:19
  • My vote down (which I removed) was due to the fact that this answer provided code **which would break in some circumstances**. You could have avoided that vote down simply by stating the implementation-defined behavior as a prerequisite. You comment about “this behavior,” but I am happy to discourage bad answers by voting them down and explaining why. There is no need to defend one’s precious answers as a personal affront. Treat it as an inanimate object to which you have no personal connection. If it has defects, it has defects, and the remedy is to fix them, not to argue about them. – Eric Postpischil Jul 27 '18 at 12:23
  • It didn't have "*defects*" and I **do** take it personal voting something (with quite some effort) down, *especially* when not even leaving a comment (which doesn't apply to **your** downvote). And about your other comment, in my code, `tmp` can't be `INT_MIN` in this line. –  Jul 27 '18 at 15:12
  • Well I see, you mean the "equivalent" to `INT_MIN` .. that's a problem indeed, strange corner case, but is fixed with a simple additional line. –  Jul 27 '18 at 15:21
  • By `tmp` being `INT_MIN`, I meant that it had the value corresponding to `INT_MIN`, specifically `(unsigned) INT_MIN`, which is 32768 in OP’s implementation. If `msb` is 128 and `lsb` is 0, then `tmp` is set to 32768, `~tmp` is 32767, `~tmp + 1` is 32768, and `rec = ~tmp + 1;` overflows because `int` cannot represent 32768. – Eric Postpischil Jul 27 '18 at 15:24
  • @EricPostpischil already understood (see earlier comment) and fixed by doing the 2's complement in `unsigned`. –  Jul 27 '18 at 15:25
  • 1
    In the new code: Again `tmp` as constructed from `msb` and `lsb` may be 32768. Then `tmp = ~tmp + 1;` sets `tmp` to 32768, and `rec = tmp;` attempts to assign to an `int` a value that `int` cannot represent. In this case, a conversion is performed, and the behavior is implementation-defined rather than undefined, per 6.3.1.3 3. OP has not stated what the implementation defines, so we cannot know whether this code would work or not. – Eric Postpischil Jul 27 '18 at 15:27
  • I'm tired. Sure, this doesn't change anything. That's why I chose to trust on a "normal" 2's-complement based implementation in my first example, anything else (without accessing the representation) is just weird and complicated. Optimizers aside, there must be a way without handling yet another special case? :o anyways, I'll write one. –  Jul 27 '18 at 15:30
  • As Eric commented elsewhere and in his answer: You can do all the shifting and splicing in straight unsigned, and `memcpy` to/from `int` at the edges. Personally, I think this borders on the insane, sacrificing real, practical efficiency for theoretical portability to machines that might not even exist. But to each his own. :-) – Steve Summit Jul 27 '18 at 15:47
  • @SteveSummit in all this madness, I'd call `memcpy()` to/from `unsigned` kind of *elegant* ... but of course, this, too, accesses representation :) –  Jul 27 '18 at 15:51
2

As you can see from the several answers so far, there are multiple approaches, and some perhaps surprising subtleties.

  1. "Mathematical" approach. You separate the bytes using shifting and masking (or, equivalently, division and remainder), and recombine them similarly. This is "option 1" in Felix Palmen's answer. This approach has the advantage that it is completely independent of "endianness" issues. It has the complication that it's subject to some sign-extension and implementation-definedness issues. It's safest if you use an unsigned type for both the composite int and the byte-separated parts of the equation. If you use signed types, you'll typically need extra casts and/or masks. (But with that said, this is the approach I prefer.)

  2. "Memory" approach. You use pointers, or a union, to directly access the bytes making up an int. This is "option 2" in Felix Palmen's answer. The very significant issue here is byte order, or "endianness". Also, depending on how you implement it, you may run afoul of the "strict aliasing" rule.

If you use the "mathematical" approach, make sure you test it on values that both do and don't have the high bit of the various bytes set. For example, for 16 bits, a complete set of tests might include the values 0x0101, 0x0180, 0x8001, and 0x8080. If you don't write the code correctly (if you implement it using signed types, or if you leave out some of the otherwise necessary masks), you will typically find extra 0xff's creeping into the reconstructed result, corrupting the transmission. (Also, you might want to think about writing a formal unit test, so that you can maximize the likelihood that the code will be re-tested, and any latent bugs detected, if/when it's ported to a machine which makes different implementation choices which affect it.)

If you do want to transmit signed values, you will have a few additional complications. In particular, if you reconstruct your 16-bit integer on a machine where type int is bigger than 16 bits, you may have to explicitly sign extend it to preserve its value. Again, comprehensive testing should ensure that you've adequately addressed these complications (at least on the platforms where you've tested your code so far :-) ).

Going back to the test values I suggested (0x0101, 0x0180, 0x8001, and 0x8080), if you're transmitting unsigned integers, these correspond to 257, 384, 32769, and 32896. If you're transmitting signed integers, they correspond to 257, 384, -32767, and -32640. And if on the other end you get values like -693 or 65281 (which correspond to hexadecimal 0xff01), or if you get 32896 when you expected -32640, it indicates that you need to go back and be more careful with your signed/unsigned usage, with your masking, and/or with your explicit sign extension.

Finally, if you use the "memory" approach, and if your sending and receiving code runs on machines of different byte orders, you'll find the bytes swapped. 0x0102 will turn into 0x0201. There are various ways to solve this, but it can be quite a nuisance. (This is why, as I said, I usually prefer the "mathematical" approach, so I can just sidestep the byte order problem.)

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • Of course, there is also a hybrid approach. `memcpy` to `unsigned`, separate the bits, send them. For receiving, assemble an `unsigned` from bits, then `memcpy` into an `int`. – Eric Postpischil Jul 27 '18 at 14:20
1

I wouldn't even write functions to do this. Both operations are straightforward applications of C's bitwise operators:

int valor = 522;
unsigned char superior = (valor >> 8) & 0xff;
unsigned char inferior = valor & 0xff;

int valorRestaurado = (superior << 8) | inferior;

Although it looks straightforward, there are always a few subtleties when writing code like this, and it's easy to get it wrong. For example, since valor is signed, shifting it right using >> is implementation-defined, although typically what that means is that it might sign extend or not, which won't end up affecting the value of the byte that & 0xff selects and assigns to superior.

Also, if either superior or inferior is defined as a signed type, there can be problems during the reconstruction. If they're smaller than int (as of course they necessarily are), they'll be immediately sign-extended to int before the rest of the reconstruction happens, demolishing the result. (That's why I explicitly declared superior and inferior as type unsigned char in my example. If your byte type is a typedef for unsigned char, that would be fine, too.)

There's also an obscure overflow possibility lurking in the subexpression superior << 8 even when superior is unsigned, although it's unlikely to cause a problem in practice. (See Eric Postpischil's comments for additional explanation.)

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • The value of `valor >> 8` is implementation-defined when `valor` is negative. Although `valor` is positive in the example shown, this code is not properly designed for general use. Additionally, given 16-bit `int`, `superior << 8` may overflow, in which case the behavior is not defined by the C standard. – Eric Postpischil Jul 27 '18 at 11:01
  • @EricPostpischil I would appreciate an explanation of how `superior << 8` could overflow. – Steve Summit Jul 27 '18 at 13:47
  • The question states that `int` is two bytes. Supposing 8-bit bytes, the maximum `int` value is 32767. In the code in this answer, `superior` is `unsigned char`. Per C 2011 (N1570) 6.5.7 3, the integer promotions are performed on operands of `<<`. Per 6.3.1.1 2, the integer promotions promote an `unsigned char` to `int`. The value of `superior` may range from 0 to 255. Suppose it is 128 (or any value from 128 to 255). Per 6.5.7 4, if 128 × 2^8 is not representable in `int`, the behavior of `128 << 8` is undefined. Since 128 × 2^8 is 32768, it is not representable in `int`. – Eric Postpischil Jul 27 '18 at 14:30
  • Also, per 6.5.7 3, the type of the result of `<<` is that of the promoted left operand, so it is `int`. Thus `superior << 8` attempts to shift an `unsigned char` into the high bits of an `int`. If the high bit of the `unsigned char` is set, this overflows the value of an `int`. Left-shift of a signed value is defined mathematically by the C standard, not as a bit operation, so it overflows rather than being defined to set the sign bit. – Eric Postpischil Jul 27 '18 at 14:33
  • @EricPostpischil Ah, right. Thanks. Perhaps I should have taken the words "mathematical approach" in my other answer to heart, and gone down the road `superior * 256 + inferior` instead. (With the thinking reversed, and `superior` forced to an explicitly *signed* type, of course.) – Steve Summit Jul 27 '18 at 14:49
  • `superior * 256` overflows too. It is necessary to do the arithmetic in a wider type, or to conditionalize it to use different expressions for different values, or to implement some other work-around. – Eric Postpischil Jul 27 '18 at 14:56
  • @EricPostpischil Hmm. I didn't think `superior * 256` could overflow -- but I didn't think `superior << 8` could, either, so I bet you're about to tell me how the signed multiplication can, too. :-) – Steve Summit Jul 27 '18 at 15:43
  • @SteveSummit with a 16bit `int`, it can if `superior` is larger than 127. –  Jul 27 '18 at 15:58
  • @FelixPalmen Sure. But if `superior` is *signed* and 8 bits (as it is in this subthread of the discussion), of course it can't be greater than 127. – Steve Summit Jul 27 '18 at 16:00
  • @SteveSummit: In the code you have in this question, `superior` is an `unsigned char`. When the integer promotions are applied, it becomes an `int`, but the value is unchanged. Thus a 128 or 255 in `superior` will become an `int` with a value of 128 or 255. Multiplying this by 256 in an implementation with 16-bit `int` results in overflow. – Eric Postpischil Jul 27 '18 at 17:34
  • @EricPostpischil In the comment that suggested multiplying by 256, I also indicated (in a parenthetical) that in that case we would have to go back to using explicitly signed values for things like `superior`, not unsigned. – Steve Summit Jul 27 '18 at 18:44
1

Given that an int is two bytes, and the number of bits per byte (CHAR_BIT) is eight, and two’s complement is used, an int named valor may be disassembled into endian-agnostic order with:

unsigned x;
memcpy(&x, &valor, sizeof x);
unsigned char Byte0 = x & 0xff;
unsigned char Byte1 = x >> 8;

and may be reassembled from unsigned char Byte0 and unsigned char Byte1 with:

unsigned x;
x = (unsigned) Byte1 << 8 | Byte0;
memcpy(&valor, &x, sizeof valor);

Notes:

  • int and unsigned have the same size and alignment per C 2011 (N1570) 6.2.5 6.
  • There are no padding bits for unsigned in this implementation, as C requires UINT_MAX to be at least 65535, so all 16 bits are needed for value representation.
  • int and unsigned have the same endianness per 6.2.6.2 2.
  • If the implementation is not two’s complement, values reassembled in the same implementation will restore the original values, but negative values will not be interoperable with implementations using different sign-bit semantics.
Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
0

You can actually, cast the address of the integer variable to a character pointer (unsigned char*, to be accurate), read the value and then increment the pointer to point to the next byte to read the value again. This conforms with the aliasing rules.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
0

Simply define an union:

typedef union
{
   int           as_int;
   unsigned char as_byte[2];
} INT2BYTE;

INT2BYTE i2b;

Put the integer value in i2b.as_int member and get byte equivalent from i2b.as_byte[0] and i2b.as_byte[1].

i486
  • 6,491
  • 4
  • 24
  • 41
  • which has the same implications (e.g. endianness, padding bits) as "manually" aliasing with `unsigned char` –  Jul 27 '18 at 11:24
  • 1
    @FelixPalmen Who says that both ends are with different endianness? – i486 Jul 27 '18 at 11:48
  • 1
    Well, did I? Who said this platform **doesn't** have padding bits? I just think some words of caution should be there when recommending to examine the representation. –  Jul 27 '18 at 11:52
  • My solution is simple and direct answer of the question "How to divide an Int into two Bytes in C". No high portability or other pretentions. From the style of definition of question, I guess the OP needs this. – i486 Jul 27 '18 at 12:58
  • Which is too bad. Based on the answers here, I wouldn't blame the OP for getting scared off of the shift-and-mask approach, and adopting a `char *` or `union`-based approach instead. But in the real world, I believe that `char *` and `union`-based techniques are *much* more likely to have actual bugs or portability problems than decently-written shift-and-mask code. – Steve Summit Jul 27 '18 at 15:13
  • @SteveSummit `union` method is "another" method. When many answers explain the shift method what can I do - offer other method or repeat again the classic shift conversion? And this is `char` array, not `char *`. – i486 Jul 27 '18 at 18:32
  • @i486 Don't worry, I wasn't saying there was anything wrong with your answer. When I said "too bad" I was lamenting the turn this question and all its answers has taken, giving the impression that shift-and-mask techniques are scary and dangerous and to be avoided. I don't believe they are, but the more words that get written about them here the scarier they seem, so I'm going to try to stop writing now. :-) (P.S. No, you didn't use a `char *`, but using char pointers is yet another way of getting at the bytes of an int.) – Steve Summit Jul 27 '18 at 18:51
-1

I am using int shrot instead of int to dry, because on the PC the int are 4 bytes and on my target platform they are 2. Use unsigned to make it easier to debug.

The code compiles with GCC (and should do it with almost any other C compiler). If Im not wrong, it depends on whether the architecture is big endian or little endian, but it would be solved by inverting the line that reconstructs the integer:

#include <stdio.h>

    void main(){
    // unsigned short int = 2 bytes in a 32 bit pc
    unsigned short int valor;
    unsigned short int reassembled;
    unsigned char data0 = 0;
    unsigned char data1 = 0;

    printf("An integer is %d bytes\n", sizeof(valor));

    printf("Enter a number: \n");
    scanf("%d",&valor);
    // Decomposes the int in 2 bytes
    data0 = (char) 0x00FF & valor;
    data1 = (char) 0x00FF & (valor >> 8);
   // Just a bit of 'feedback'
    printf("Integer: %d \n", valor);
    printf("Hexa: %X \n", valor);
    printf("Byte 0: %d - %X \n", data0, data0);
    printf("Byte 1: %d - %X \n", data1, data1);
    // Reassembles the int from 2 bytes
    reassembled = (unsigned short int) (data1 << 8 | data0);
    // Show the rebuilt number
    printf("Reassembled Integer: %d \n", reassembled);
    printf("Reassembled Hexa: %X \n", reassembled);
    return;
}
Mochuelo
  • 76
  • 7
  • 1
    OP asks for code to separate the bytes of a signed `int`, but this answer shows code for `unsigned short` and does not explain how to use it for `int` or `short`. As adapting the code for signed types is prone to error due to C semantics regarding signed types, bit shifts, and overflows, this is a problem. For example, `data1 << 8` will overflow in C implementations with 16-bit `int` types if the high bit of `data1` is set. This is because the `unsigned char` `data1` will be promoted to `int`, so the shift will be done in the signed `int` type. – Eric Postpischil Jul 27 '18 at 11:07
  • OP requests a solution that reassembles the bytes, which i did – Mochuelo Jul 27 '18 at 11:12
  • The problem is not that this code does not provide a solution to reassemble bytes that it does not provide a solution to reassemble the bytes of a two-byte `int`, as the problem requests. Providing code for `unsigned short` when `int` is requested is not a solution. – Eric Postpischil Jul 27 '18 at 11:22
  • i do not see that he requested explicitly for an `int` as you can read in the OP: "Actually, any solution that divides the whole into 2 bytes and reassembles it serves me well." I think it does not matter if its an int or a short to be honest – Mochuelo Jul 27 '18 at 11:30
  • “I have an **Int** variable, two bytes in size, but I need to divide it into 2 bytes separately to be able to transmit it, and then I can, reading the two bytes, reassemble the original **Int**.” [Emphasis added.] – Eric Postpischil Jul 27 '18 at 11:31
  • I will repeat it **again** just for you: "Actually, any solution that divides the whole into 2 bytes and reassembles it serves me well." _[Emphasis added.]_ – Mochuelo Jul 27 '18 at 12:33
  • I will repeat it **again** just for you: The question requests a solution for an **`int`**. – Eric Postpischil Jul 27 '18 at 12:51
  • This code is **wrong** because it overflows on OP’s target implementation: `data1 << 8`. – Eric Postpischil Jul 27 '18 at 12:52
  • you are **totally wrong** i have just tested it and it works on my machine.l – Mochuelo Jul 27 '18 at 12:57
  • Guys. Peace. You're not going to convince each other. One of you is talking about code that's good enough for today; one of you is talking about code that's guaranteed to work on any machine now or until the end of time. Both answers have their place; neither is totally right or totally wrong. – Steve Summit Jul 27 '18 at 13:07
  • The fact that it “works” on your machine is not evidence of any kind that it works on OP’s implementation. The rules for C are set by the C standard, not by how your machine works, and OP specifically asks for a solution for their machine, not yours. – Eric Postpischil Jul 27 '18 at 14:19