15

I have an 8-character string representing a hexadecimal number and I need to convert it to an int. This conversion has to preserve the bit pattern for strings "80000000" and higher, i.e., those numbers should come out negative. Unfortunately, the naive solution:

int hex_str_to_int(const string hexStr)
{    
    stringstream strm;
    strm << hex << hexStr;
    unsigned int val = 0;
    strm >> val;
    return static_cast<int>(val);
}

doesn't work for my compiler if val > MAX_INT (the returned value is 0). Changing the type of val to int also results in a 0 for the larger numbers. I've tried several different solutions from various answers here on SO and haven't been successful yet.

Here's what I do know:

  • I'm using HP's C++ compiler on OpenVMS (using, I believe, an Itanium processor).
  • sizeof(int) will be at least 4 on every architecture my code will run on.
  • Casting from a number > INT_MAX to int is implementation-defined. On my machine, it usually results in a 0 but interestingly casting from long to int results in INT_MAX when the value is too big.

This is surprisingly difficult to do correctly, or at least it has been for me. Does anyone know of a portable solution to this?

Update:

Changing static_cast to reinterpret_cast results in a compiler error. A comment prompted me to try a C-style cast: return (int)val in the code above, and it worked. On this machine. Will that still be safe on other architectures?

Michael Kristofik
  • 34,290
  • 15
  • 75
  • 125
  • 1
    Can't just use `(int)val`? However, "Changing the type of val to `int` also results in a 0...." means issue might be from `>>`? (I have no idea really, I don't use C++ ;-) –  Sep 29 '11 at 18:27
  • Signed integer overflow isn't implementation-defined, its undefined. – derobert Sep 29 '11 at 18:29
  • @derobert, thanks, I wasn't sure. I knew it wasn't good. Updated the question accordingly. – Michael Kristofik Sep 29 '11 at 18:32
  • 1
    Converting from unsigned to signed is implementation defined if the unsigned number is not in the range of the signed type. – JohnPS Sep 29 '11 at 18:53
  • @derobert : This isn't signed overflow though, this is integral conversion, the result of which _is_ implementation-defined. – ildjarn Sep 29 '11 at 18:56

6 Answers6

15

Quoting the C++03 standard, §4.7/3 (Integral Conversions):

If the destination type is signed, the value is unchanged if it can be represented in the destination type (and bit-field width); otherwise, the value is implementation-defined.

Because the result is implementation-defined, by definition it is impossible for there to be a truly portable solution.

ildjarn
  • 62,044
  • 9
  • 127
  • 211
13

While there are ways to do this using casts and conversions, most rely on undefined behavior that happen to have well-defined behaviors on some machines / with some compilers. Instead of relying on undefined behavior, copy the data:

int signed_val;
std::memcpy (&signed_val, &val, sizeof(int));
return signed_val;
David Hammen
  • 32,454
  • 9
  • 60
  • 108
  • 4
    Implementation-defined behavior, not undefined. – ildjarn Sep 29 '11 at 18:57
  • 2
    @ildjarn: One widely used approach is `return *(int*)(&val);` This isn't implementation-defined behavior. It is undefined behavior. – David Hammen Sep 29 '11 at 19:01
  • 1
    Ah, that's equivalent to a `reinterpret_cast`, which is indeed UB; I assumed you were referring to the `static_cast` in the OP's question, whose behavior is implementation-defined. – ildjarn Sep 29 '11 at 19:03
  • Why is it undefined behavior? – Friedrich Jul 18 '17 at 23:57
  • This is surprisingly fast with modern compilers. – Emile Cormier Feb 11 '20 at 01:06
  • C++20 will have `bit_cast` which effectively does the same thing: https://en.cppreference.com/w/cpp/numeric/bit_cast – Emile Cormier Feb 11 '20 at 03:17
  • 1
    @EmileCormier - It's fairly fast even at the lowest optimization level as the call to `memcpy` does not occur; I tested with multiple compilers. At anything but the lowest optimization level it's extremely fast because the working variable (`signed_val` in my answer) gets optimized away. I suspect these kinds of optimizations were in place well before I wrote the above answer 9+ years ago. The as-if rule was certainly in place even with the original version of the standard. That is the rule that enables the elimination of the call to `memcpy`. – David Hammen Feb 11 '20 at 07:21
  • @Friedrich - It's undefined because the standard very explicitly says so. Doing so violates C++'s strict aliasing rule, which is even stricter than C's strict aliasing rule. The C-style cast `return *(int*)(&val)` is undefined behavior even in C. – David Hammen Feb 11 '20 at 07:38
  • @DavidHammen It's only recently I've encountered a similar problem to the OP and discovered that compilers can optimize away memcpy. Thanks for pointing out that compilers knew that trick long ago. – Emile Cormier Feb 11 '20 at 17:40
5

You can negate an unsigned twos-complement number by taking the complement and adding one. So let's do that for negatives:

if (val < 0x80000000) // positive values need no conversion
  return val;
if (val == 0x80000000) // Complement-and-addition will overflow, so special case this
  return -0x80000000; // aka INT_MIN
else
  return -(int)(~val + 1);

This assumes that your ints are represented with 32-bit twos-complement representation (or have similar range). It does not rely on any undefined behavior related to signed integer overflow (note that the behavior of unsigned integer overflow is well-defined - although that should not happen here either!).

Note that if your ints are not 32-bit, things get more complex. You may need to use something like ~(~0U >> 1) instead of 0x80000000. Further, if your ints are no twos-complement, you may have overflow issues on certain values (for example, on a ones-complement machine, -0x80000000 cannot be represented in a 32-bit signed integer). However, non-twos-complement machines are very rare today, so this is unlikely to be a problem.

bdonlan
  • 224,562
  • 31
  • 268
  • 324
  • Yeah I'm pretty sure this code is likely to run in a 64-bit environment someday. Hard-coding bit patterns like that is probably not a good idea. This solution works on this machine though. – Michael Kristofik Sep 29 '11 at 18:47
  • Most 64-bit environments use 32-bit ints. In any case, though, you can use `~(~(unsigned yourinttype)0 >> 1)` to find the right value for other unsigned integer types (eg, `unsigned long long`) – bdonlan Sep 29 '11 at 18:53
4

C++20 will have std::bit_cast that copies bits verbatim:

#include <bit>
#include <cassert>
#include <iostream>

int main()
{
    int i = -42;
    auto u = std::bit_cast<unsigned>(i);
    // Prints 4294967254 on two's compliment platforms where int is 32 bits
    std::cout << u << "\n";

    auto roundtripped = std::bit_cast<int>(u);
    assert(roundtripped == i);
    std::cout << roundtripped << "\n"; // Prints -42

    return 0;
}

cppreference shows an example of how one can implement their own bit_cast in terms of memcpy (under Notes).

While OpenVMS is not likely to gain C++20 support anytime soon, I hope this answer helps someone arriving at the same question via internet search.

Emile Cormier
  • 28,391
  • 15
  • 94
  • 122
4

Here's another solution that worked for me:

if (val <= INT_MAX) {
    return static_cast<int>(val);
}
else {
    int ret = static_cast<int>(val & ~INT_MIN);
    return ret | INT_MIN;
}

If I mask off the high bit, I avoid overflow when casting. I can then OR it back safely.

Michael Kristofik
  • 34,290
  • 15
  • 75
  • 125
-2
unsigned int u = ~0U;
int s = *reinterpret_cast<int*>(&u); // -1

Сontrariwise:

int s = -1;
unsigned int u = *reinterpret_cast<unsigned int*>(&s); // all ones
Papayaved
  • 103
  • 1
  • 1
  • 11