4

I came across an issue with C++ trying to read a text file filled with signed integer numbers in hexadecimal form and parsing them to vectors. I used the C++ stream to variable redirect (stream >> var), and it seems that negative numbers are not parsed correctly - the variable gets the value 0, and the stream fail flag is set.

If I try to convert the string using strtol() function, the results are as expected. Likewise, if I try to first redirect the stream to an unsigned integer and than cast the variable to signed integer, the results are again correct and no stream error is reported.

I'm using gcc 6.3.0 on Debian 9.1 (x64), running on Xeon E5-2643 v3 system.

Did anyone else experience this issue? I would expect the conversion to work the same way as the strtol function, and not report any stream errors. Am I missing some stream settings / forgetting to call some function or set some flag here?

Any suggestions would be greatly appreciated.

Attached below is an example C++ program demonstrating this problem.

#include <iostream>
#include <sstream>
#include <cstdio>
#include <cstdlib>
#include <cstdint>


int main()
{
  const char* minus_one = "0xffffffff";

  std::stringstream ss;
  ss << minus_one;

  std::cout << "input string    : " << ss.str() << "\n"; // outputs "0xffffffff"

  // C-style conversion
  int32_t cint;
  cint = strtol(ss.str().c_str(), NULL, 0);
  std::cout << "strtol conv     : " << cint <<  " (" << std::hex << cint << ")\n"; // outputs "-1 (ffffffff)"
  std::cout << std::dec;

  // C++-style conversion
  int32_t cppint;
  ss >> std::hex >> cppint;
  std::cout << std::dec << "ssextr conv     : " << cppint <<  " (" << std::hex << cppint << ")\n"; // outputs "0 (0)" <- ERROR
  std::cout << std::dec;
  if (ss.fail()) std::cout << "Error converting number.\n";

  // C++-style conversion with cast
  uint32_t cppuint;
  int32_t cppint2;
  ss.clear();
  ss.str(minus_one);
  ss >> std::hex >> cppuint;
  cppint2 = (int32_t)cppuint;
  std::cout << std::dec << "ssextr cast conv: " << cppint2 <<  " (" << std::hex << cppint2 << ")\n"; // outputs "-1 (0xffffffff)"
  std::cout << std::dec;
  if (ss.fail()) std::cout << "Error converting number.\n";

  exit(EXIT_SUCCESS);
}
user3780807
  • 43
  • 1
  • 4
  • 2
    There's no such thing as a negative hex number. It is possible to interpret the bit pattern of a hex value as a signed integer and get a negative value, but that's a different thing. The number you're trying to read `4294967295` does not fit in a 32-bit signed integer so the conversion fails. If you use a `uint32_t` it works fine, and then you can interpret those bits however you like. – Retired Ninja Mar 06 '18 at 14:31
  • I assumed I made it clear that I'm trying to interpret the hexadecimal numbers as signed 32-bit integers, I didn't literally mean negative hexadecimal numbers. The number mentioned here (4294967295 or 0xffffffff in hex) fits in 32bits, so it should fit both signed and unsigned integers - as the example program demonstrates, when interpreted as a signed number, that is -1. And yes, I already know that I can read it into a uint32_t and then cast it to int32_t with correct results, but I shouldn't have to do that. – user3780807 Mar 06 '18 at 14:40
  • 1
    Would you expect to be able to read `4294967295` into an int32_t? Try it, it will fail too because the value is too large to be represented properly in that type. – Retired Ninja Mar 06 '18 at 14:41
  • https://stackoverflow.com/questions/12125650/what-do-the-c-and-c-standards-say-about-bit-level-integer-representation-and-m – Retired Ninja Mar 06 '18 at 14:57
  • 1
    You made it clear that you would like it to behave a certain way, but it doesn't actually behave that way. If you try to read a value into a type that the language knows cannot represent that value, then there is a problem. The issue is that you want the problem to be ignored, and c++ doesn't ignore it. Use a larger int, or keep a cast as you've done. NOTE you are relying on overflowing a signed number, which is undefined behavior. – Chris Uzdavinis Mar 06 '18 at 15:40
  • I didn't 'like' it to behave a certain way, I just saw a discrepancy between the strtol() behavior (which I expected, mistakenly) and the C++ redirect behavior (since I'm more used to C than C++). Also, I'm not sure in what way am I relying on overflowing in this case, can you elaborate? – user3780807 Mar 06 '18 at 17:34
  • There's no overflow here, so no undefined behaviour. Converting from an integer type to a signed integer type produces an **implementation-defined value** if it doesn't fit in the destination type, and GCC defines it as ["For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type"](https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html) (I expect most compilers do the same for 2s-complement architectures). – Jonathan Wakely Mar 06 '18 at 18:09

3 Answers3

2
int32_t cint;
cint = strtol(ss.str().c_str(), NULL, 0);

This reads the value 0xffffffff into a long, then converts that to int32_t. If long is larger than 32-bits the strtol works and returns 0xffffffff i.e. 4294967295, and converting that to int32_t produces -1. But that's not the same as reading a negative number from the string (and if long is 32-bits then it doesn't work as you expect, instead it returns LONG_MAX and converts that to int32_t, which is 0x7fffffff).

int32_t cppint;
ss >> std::hex >> cppint;

This tries to read the value 0xffffffff into an int32_t but the value 0xffffffff doesn't fit in that type, so reading the value fails (just like it fails with strtol when long is 32-bits).

A closer equivalent to your strtol version would be:

int32_t cppint;
long l;
if (ss >> std::hex >> l)
  cppint = l;
else
  // handle error ...

It's unreasonable to expect to be able to read the value 0xffffffff into a signed 32-bit integer. strtol and istreams do not read bit patterns, they read numbers, and the number 0xffffffff doesn't fit in a signed 32-bit integer.

Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
  • Thank you, that explains my misconceptions. I believed that hexadecimal numbers would in fact be bit patterns of a sort, I didn't realize they are treated as unsigned. – user3780807 Mar 06 '18 at 17:28
  • 1
    The string `"0xffffffff"` is not treated as unsigned, it's treated as the number 4294967295 which is a large, positive value that doesn't fit in a signed 32-bit type. The string `"-0xffffffff"` would be treated as a negative value (but also wouldn't fit in a signed 32-bit value). The way to write `-1` in hex is `-0x1` not `0xffffffff`. – Jonathan Wakely Mar 06 '18 at 17:42
  • Thanks, I think I understand now. – user3780807 Mar 06 '18 at 17:52
0

if the first hex bit is f, then c++ makes it a "large number": '0x7fffffff'. It seems like that c++ doesn't want express it as a negative number. Like this:

const char* minus_one = "0xf0000000";   //ssextr conv     : 2147483647 (7fffffff)
std::stringstream ss;
ss << minus_one;

// C++ style conversion
int32_t cppint;
ss >> std::hex >> cppint;
std::cout << std::dec << "ssextr conv     : " << cppint <<  " (" << std::hex << cppint << ")\n"; 
std::cout << std::dec;
if (ss.fail()) {
    std::cout << "Error converting number.\n";
}
alamoot
  • 1,966
  • 7
  • 30
  • 50
0

The problem is that the hexadecimal notation is documented for unsigned integers. strtol is a C function and is apparently more tolerant with the hexadecimal representation of a negative integer, and internally reads the string as an unsigned value and then re-interprets it as a signed value. But even in C, such a processing is unspecified for strtol and for a conversion of an unsigned value that cannot be represented in a signed type either the result is implementation-defined or an implementation-defined signal is raised. (from draft 1570 for C11 6.3.1.3 [Conversions] Signed and unsigned integers)

It is likely that it works that way in order not to break tons of legacy code, but C++ is a more recent language, and implementers have decided to be more strict for hexadecimal representation.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • Thanks, I didn't realize hex numbers are actually treated as unsigned. I guess I assumed they are just taken as bit representations. – user3780807 Mar 06 '18 at 17:30
  • 1
    This answer is misleading, the C++-style code using an istream is not being more strict, it's just doing something different. See my answer, but briefly the `strtol` code reads the value into a `long`, which works if `long` is 64-bits, and then converts that to `int32_t`. That's not the same as reading into a `int32_t` like the istream version does. Reading into a type that can represent the value and converting is not the same as reading into a type that cannot represent the value! – Jonathan Wakely Mar 06 '18 at 17:39