0

Is there a portable and safe way to interpret the bit-pattern made by a boost::uint16_t as a boost::int16_t? I have a uint16_t, which I know represents a signed 16-bit integer encoded as little-endian. I need to do some signed arithmetic on this value, so is there anyway to convince the compiler that it already is a signed value?

If I a not mistaken, a static_cast<int16_t> would convert the value, perhaps changing its bit-pattern.

Viktor Dahl
  • 1,942
  • 3
  • 25
  • 36
  • 3
    `reinterpret_cast< int16_t& >( uint16_value );` should work anywhere, even though is implementation defined. – K-ballo Sep 18 '11 at 20:52
  • How did the signed value get into the `uint`? Normally, the way to convert it to signed is to revert that operation. – jalf Sep 18 '11 at 21:23
  • @jalf The signed value comes from serialized data stored in a file. I use the unsigned type because I need to do extensive bitwise operations on 2-byte words. – Viktor Dahl Sep 18 '11 at 21:25

6 Answers6

2

If you are looking for something different than a cast, then copy its memory representation to that of a boost::int16_t since its what it represents to begin with.

Edit: If you have to make it work on a big endian machine, simply copy the bytes backwards. Use std::copy and std::reverse.

K-ballo
  • 80,396
  • 20
  • 159
  • 169
  • You mean something in the manner of a `memcpy`? That sound like it would work. – Viktor Dahl Sep 18 '11 at 20:49
  • @Viktor Dahl: Indeed, `memcpy` would do. Or the fancy `std::copy` is there as well. – K-ballo Sep 18 '11 at 20:50
  • as I and David Hammen stated in our answers, this memcpy could go wrong if target system is big-endian. other than that there is no difference in results between copying memory or just casting it away. just when you cast there is no accual code generated to cast but when you use memcpy it consumes some time. – Ali1S232 Sep 18 '11 at 21:27
  • @Gajet: Except that casting is undefined behavior and memcpying isn't. Also, most casts include code generation, only `reinterpret_cast` doesn't. – K-ballo Sep 18 '11 at 21:30
  • @K-Ballo: the cast we are offering is just the definition of `reinterpret_cast`. remember we are working with pointers to primary types. – Ali1S232 Sep 18 '11 at 21:39
  • @K-ballo: To use `std::copy_backward` or `std::reverse` you are going to have to cast pointers to `uint16_t` to a `char*` -- and that too is undefined behavior. Only now it is not UB that the most compilers recognize and do the right thing. – David Hammen Sep 18 '11 at 21:40
  • @Gajet: Yes, I know you are referring to `reinterpret_cast`, and I saw you use a C style cast on your answer as well. I was just claryfing it to the casual reader that after reading your comment may think that no actual code is generated to cast. – K-ballo Sep 18 '11 at 21:40
  • @Gajet: Most compilers will optimize away the call to `memcpy` here. It will instead generate inline assembly to copy the two bytes. – David Hammen Sep 18 '11 at 21:42
  • @David Hammen: I'm pretty sure casting a pointer to any of the char types pointer is defined bevahior. Let me go dig into the standard for a while... – K-ballo Sep 18 '11 at 21:43
  • `std::copy_backward` does not actually reverse anything. – Nemo Sep 18 '11 at 21:48
  • @K-ballo: This still has the portability problems related to representation of signed numbers. – David Hammen Sep 18 '11 at 22:01
1

Just use the static cast. Changing the bit pattern happens to be exactly what you want, if you happen to be on a platform that defines them differently.

reinterpret_cast, or any equivalent pointer cast, is undefined (not implementation defined). That means the compiler is free to do nasty things like cache the undefined form in a register and miss the update. Besides, if you were on a platform where the bit patterns were different then bypassing the conversion would leave you with garbage (just like pretending a float is an int and adding 1 to it.)

More info is at Signed to unsigned conversion in C - is it always safe? but the summary C, in a roundabout way, defines the static cast (ordinary C cast actually) as exactly what you get by treating the bits the same on x86 (which uses two's complement.)

Don't play chicken with the compiler (this always worked on this compiler so surely they won't break everybody's code by changing it). History has proven you wrong.

Community
  • 1
  • 1
Rhamphoryncus
  • 339
  • 1
  • 6
0

Mask off all but the sign bit, store that in a signed int, then set the sign using the sign bit.

Carey Gregory
  • 6,836
  • 2
  • 26
  • 47
  • Only correct answer here so far. Note that you will probably have to handle `INT_MIN` as a special case, since the natural way to write this will end up doing -1 * 0 == 0 for that case. – Nemo Sep 18 '11 at 21:10
  • Not correct. This only works if the target machine is little endian. – David Hammen Sep 18 '11 at 21:11
  • 2
    @David: You are completely wrong, and also very confused about what "big-endian" actually means. You cannot tell just by shifting and masking integers whether a system is big-endian. (Nor should you try. Unless you are serializing data for transport, you never care whether the system is big- or little- endian.) – Nemo Sep 18 '11 at 21:13
  • 1
    How do you "mask off the sign bit" of an unsigned integer? It doesn't *have* a sign bit. And signed integers are not required to be represented in two's complement (or one's complement, or any other specific representation, making it impossible to talk about a signed integer's "sign bit" in a portable manner as well) – jalf Sep 18 '11 at 21:15
  • 2
    @jalf: By "sign bit" he means "high bit". The suggestion is to mask off the high bit (but remember it), copy the resulting number to an int16_t, then multiply by -1 if the high bit was set. This does not depend on signed integers being two's complement. – Nemo Sep 18 '11 at 21:18
  • Shame that information isn't in the answer, then. But it *does* depend on the representation of signed integers. You cannot, if you want your code to be portable, be sure that the highest bit is set iff the value is negative. – jalf Sep 18 '11 at 21:20
  • @jalf: The information most certainly is in the answer. He stated it's a signed value stored in an unsigned type. Therefore, it has a sign bit. Why is that unclear? – Carey Gregory Sep 18 '11 at 21:26
  • @jalf: The OP did not specify the representation of the "signed 16-bit integer encoded as little-endian", so as phrased the question has no answer. If you assume he means two's complement, then this answer is the only correct one so far. – Nemo Sep 18 '11 at 21:27
  • @Nemo: The OP most certainly did say just that: The `uint16_t` "*represents a signed 16-bit integer encoded as little-endian*". The OP has not been edited as of the time I posted this comment. That statement has been there from the very start. – David Hammen Sep 18 '11 at 21:29
  • @Nemo: little-endian has nothing to do with it. My point is that there is nothing in the standard that guarantees that the high bit of a signed integer will be set if and only if the value is negative. An implementation could use the lowest bit as a sign bit, or some other encoding. – jalf Sep 18 '11 at 21:29
  • @David: I meant he did not specify two's complement, which is a necessary piece of information. – Nemo Sep 18 '11 at 21:31
  • Carey, a signed value doesn't necessarily have a sign bit, and if it does, the sign bit isn't necessarily the high bit. But what I meant was that your answer lacks the information in @Nemo's second comment (specifying *how* to "set the sign using the sign bit", which is a pretty important detail) – jalf Sep 18 '11 at 21:32
  • @jalf: I understand that, but he's already engaged in type punning and there's no 100% portable way out of it. He's going to have to make some assumptions about the implementation of the data types. All he can do is minimize the portability issues. – Carey Gregory Sep 18 '11 at 21:50
  • jalf has a good point here as far as widespread portability is concerned. The standard allows signed integers to be implemented using two's complement, one's complement, or signed magnitude. It doesn't say anything about where the sign bit goes. It could be in bit smack dab in the middle of the 16 bit word, for example. Endianness is only part of the problem here. There is no truly portable answer to this question. – David Hammen Sep 18 '11 at 21:56
  • @David Hammen: Isn't that what I just said? – Carey Gregory Sep 18 '11 at 22:05
0

I guess *(boost::int16_t*)(&signedvalue) would work, unless your system architecture is not little-endian by default. endian ness will change behavior since after above operation cpu will treat signed value as a architecture specific boost::int16_t value (meaning if your architecture is big endian it'll go wrong).

Ali1S232
  • 3,373
  • 2
  • 27
  • 46
  • 2
    a) what has this to do with endianness? b) this is undefined behaviour. – Kerrek SB Sep 18 '11 at 20:55
  • c) this only works if `sizeof(int) == sizeof(int16_t)` which hasn't been true on most common architectures for years. – Chris Lutz Sep 18 '11 at 21:02
  • @Kerrek: Strictly speaking this is UB. Practically speaking, it is not. That idiom is very far widespread. A compiler vendor that invoked the `erase_the_hard_driver()` function in response would find itself losing customers right and left. So they do just what it says to do, and they do not optimize it away. – David Hammen Sep 18 '11 at 21:07
  • 1
    @Chris: I think he meant `*(int16_t)(&unsigned_value)`. – David Hammen Sep 18 '11 at 21:07
  • @Chris,@kerrek: edited my answer, david was right. it's not undefined behavior, it's just may cause unexpected results if you are not sure about target platform. – Ali1S232 Sep 18 '11 at 21:13
  • @Nemo: just read the the question : he is sure the number stored in variable is `signed 16-bit integer encoded as little-endian`! it doesn't have anything to do with system being little-endian or big-endian! – Ali1S232 Sep 18 '11 at 21:20
  • @Gajet: Yeah, I missed that part of the question. Removing my other comments. – Nemo Sep 18 '11 at 21:24
0

Edit
To avoid controversy over *(int16_t)(&input_value), I changed the last statement in the code block to memcpy and added *(int16_t)(&input_value) as an addendum. (It as the other way around).

On a big endian machine you will need to do a byte swap and then interpret as a signed integer:

if (big_endian()) {
  input_value = (uint16_t)((input_value & 0xff00u) >> 8) |
                (uint16_t)((input_value & 0x00ffu) << 8);
}
int16_t signed_value;
std::memcpy (&signed_value, &input_value, sizeof(int16_t));

On most computers you can change the call to memcpy to signed_value = *(int16_t)(&input_value);. This is, strictly speaking, undefined behavior. It is also an extremely widely used idiom. Almost all compilers do the "right thing" with this statement. But, as is always the case with extensions to the language, YMMV.

Nemo
  • 70,042
  • 10
  • 116
  • 153
David Hammen
  • 32,454
  • 9
  • 60
  • 108
  • @Nemo: The code inside the `if` is the standard implementation of `ntohs` and `htons` on a little endian machine. This code most certainly does do something: It swaps the bytes. The problem is that we can't use either `ntohs` and `htons` here because the incoming value is not network order. It is little endian instead. – David Hammen Sep 18 '11 at 21:15
  • @nemo that code above is changing endianness correctly, and it's needed if system is big-endian. also the cast there is not undefined behavior. since we know `sizeof(int16_t) == sizeof(uint16_t)`! – Ali1S232 Sep 18 '11 at 21:16
  • @Nemo: I did note that the OP might have to use a `memcpy`. It is dubious that he would. Find a compiler that does not accept this idiom. It's use is far too widespread for compilers to do anything but accept it. – David Hammen Sep 18 '11 at 21:18
  • @Nemo: The value is **known** to represent "*a signed 16-bit integer encoded as little-endian*". In other words, a 16 bit signed integer sent over a network connection without bothering to convert to network order and received as a 16 bit unsigned integer. Lots of people ignore network order nowadays given the preponderance of little endian machines. Byte swapping most certainly is necessary when the recipient of that value is a big endian machine. – David Hammen Sep 18 '11 at 21:21
  • @David: Sorry, I missed that detail in the question. Removing my other comments. – Nemo Sep 18 '11 at 21:22
  • @Gajet: It doesn't matter than `sizeof(uint16_t) == sizeof(int16_t)`. That debated line is, strictly speaking, undefined behavior. An optimizer can, strictly speaking, do whatever it wants. Practically speaking, this idiom is so widespread that the behavior is very well defined. Think of it as a widely implemented extension to the language. Or use `memcpy` if it grates too much. – David Hammen Sep 18 '11 at 21:25
  • @david: it really does matter if their sizes doesn't match, I'm not sure why but there are some memory paddings that compiler does on it's own and may change the results if two object doesn't have same size or same order. (I'm not talking about this specific question, just thinking of other objects) – Ali1S232 Sep 18 '11 at 21:33
  • @David: Compilers _do_ optimize based on the assumption you will not overflow signed integers. Try `int foo(int x) { return (x+1) > x; }` and then call `foo(INT_MAX)`. Using GCC with optimization enabled, this function returns `1` unconditionally, because integer overflow is Undefined Behavior. So this concern is not just hypothetical. – Nemo Sep 18 '11 at 21:34
  • @David: I fixed a couple of problems in your answer. `memcpy` arguments were backwards. Also `us` is not a legal suffix for an integer constant in C++. – Nemo Sep 18 '11 at 21:47
  • @Nemo: What does overflow have to do with anything here? There is nothing here to overflow. – David Hammen Sep 18 '11 at 21:47
  • On a one's complement or sign-magnitude system, 0x8000 will overflow. (Or at least, not be representable on the target system...) – Nemo Sep 18 '11 at 21:59
  • @Nemo: There is no overflow here. On a one's complement system, 0x8000 is -32767. On a sign-magnitude it is negative zero. The problem is not overflow. It is that we don't know the representation of signed integers on the source and destination machines. – David Hammen Sep 18 '11 at 22:22
0

As a different tack, the best way to maximize (but not ensure) portability is to store those signed 16 bit integers as signed 16 bit integers in network order rather than as unsigned 16 bit integers in little endian order. This puts the burden on the target machine to be able to translate those 16 bit network order signed integers to 16 bit signed integers in the native form to the target. Not every machine supports this capability, but most machines that can connect to a network do. After all, that file has to get to the target machine by some mechanism, so the odds are pretty good that it will understand network order.

On the other hand, if you are zapping that binary file to some embedded machine via some proprietary serial interface, the answer to the portability question is the same answer you'll get when you tell your doctor "it hurts when I do this."

David Hammen
  • 32,454
  • 9
  • 60
  • 108