Casting hex string to signed int results in different values in different platforms

Question

I am dealing with an edge case in a program that I want to be multi-platform. Here is the extract of the problem:

#include <stdio.h>
#include <string.h>

void print_bits(size_t const size, void const * const ptr){
    unsigned char *b = (unsigned char*) ptr;
    unsigned char byte;
    int i, j;

    for (i=size-1;i>=0;i--)
    {
        for (j=7;j>=0;j--)
        {
            byte = (b[i] >> j) & 1;
            printf("%u", byte);
        }
    }
    puts("");
}

int main() {

char* ascii = "0x80000000";
int myint = strtol(ascii, NULL, 16);

printf("%s to signed int is %d and bits are:\t", ascii, myint);
print_bits(sizeof myint, &myint);

return 0;
}

So when I compile with GCC on Linux I get this output:

0x80000000 to signed int is -2147483648 and bits are:   10000000000000000000000000000000

On a Windows, using MSVC and MinGW I get:

0x80000000 to signed int is 2147483647 and bits are:    01111111111111111111111111111111

I think the GCC outputs the correct expected values. My question is, where does this difference come from and how to make sure that on all compilers I get the correct result?

UPDATE

The reason behind this code is, I have to check if the MSB (bit #31) of the HEX value is 0 or 1. Then, I have to get the unsigned integer value of the next 7 bits (#30 to #24) result (in case of 0x80000000these 7 bits should result in 0:

    int msb_is_set = myint & 1;
    uint8_t next_7_bits;

next_7_bits = myint >> 24; //fine on GCC, outputs 0 for the next 7 bits
#ifdef WIN32 //If I do not do this, next_7_bit will be 127 on Windows instead of 0
    if(msb_is_set )
        next_7_bits = myint >> 1;
#endif

P.S. This is on the same machine (i5 64bit)

Can you explain what behaviour you expect on all platforms, and more importantly, why? `int` may be any size from 16-bit up, and `long` may be any size from 32-bit up, and some platforms may not use 2's complement — M.M, Jun 05 '17 at 09:51
Also, will you be using input that contains a leading `-` sign? — M.M, Jun 05 '17 at 09:54
NB. This code is a constraint violation due to calling `strtol` without a prototype in scope, your compilers should all diagnose this (if not then reconsider your compiler switches). (In C89 it was undefined behaviour with no diagnostic required) — M.M, Jun 05 '17 at 09:57
Making common assumptions about the platforms; your MSVC output is correct according to the C Standard, and the "gcc on linux" is implementation-defined — M.M, Jun 05 '17 at 10:02
My guess is that the MSVCRT implementation of `strtol` clamps out-of-range negative values to `LONG_MIN` and out-of-range positive values to `LONG_MAX`. — Ian Abbott, Jun 05 '17 at 10:03
@IanAbbott as the `glibc` implementation does, as well. But `long` on 64bit Linux has 64 bits. — , Jun 05 '17 at 10:04
@M.M Yes, so the difference comes from `long` being 32 bits on Windows and 64 bits on 64-bit Linux. — Ian Abbott, Jun 05 '17 at 10:05
Re. the update, `myint & 1` is the LSB. The MSB is the sign-bit for signed integers. Please clarify which one you mean — M.M, Jun 05 '17 at 12:06
also it is not clear to me what is going on in your conditional compliation -- reading bits 30-24 is the same for both cases, the code in your "#ifdef WIN32" actually shifts right by one, which doesnt have anything to do with your stated goal — M.M, Jun 05 '17 at 12:10
@M.M My code works fine, If I am on windows, and if I do not shift by 1 in that ifdef statement I get `127` instead of `0` — Dumbo, Jun 05 '17 at 12:20
@M.M You are right the GCC ifdef was not necessary. but I still have to do the shift — Dumbo, Jun 05 '17 at 12:22

score 4 · Accepted Answer · 2017-06-05T10:09:05.930

4

You're dealing with different data models here.

Windows 64 uses LLP64, which means only long long and pointers are 64bit. ~~As strtol converts to long, it converts to a 32bit value, and 0x80000000 in a 32bit signed integer is negative.~~

Linux 64 uses LP64, so long, long long and pointers are 64bit. ~~I guess you see what's happening here now ;)~~

Thanks to the comments, I realize my initial answer was wrong. The different outcome indeed has to do with the differing models on those platforms. But: in case of the LP64 model, you have a conversion to a signed type that cannot hold the value, which is implementation defined. int is 32bit on both platforms and a 32bit int just cannot hold 0x80000000. So the correct answer is: you shouldn't expect any result from your code on Linux64. On Win64, as long is only 32bit, strtol() correctly returns LONG_MAX for 0x80000000, which happens to be just one smaller than your input.

edited Jun 05 '17 at 10:09

answered Jun 05 '17 at 09:33

There's no undefined behaviour. Overflow is when the result of an arithmetic operation would produce a value out of range, but a conversion is not an arithmetic operation – M.M Jun 05 '17 at 10:05
@M.M so it's at least implementation defined (it depends on the internal representation) -- which still forbids any assumptions on the outcome. But sure, to be correct, I'll change the wording, thanks! – Jun 05 '17 at 10:08
@FelixPalmen I mean on GCC I could just do `myint >> 24` to get the 7 next bits into a `uint8_t` data type. in case of `0x80000000` it should be 0, but on windows it is 127 and I have to shift by 1 to get 0. – Dumbo Jun 05 '17 at 12:23
@SaeidYazdani it's now *very unclear* what you want to accomplish and it doesn't look like something *portable*. But for a straight-forward way to check the MSB of a string converted to `unsigned int`, see my other answer. – Jun 05 '17 at 12:25

score 1 · Answer 2 · answered Jun 05 '17 at 09:33

int myint = strtol(ascii, NULL, 16);

strtol is 'string to long', not string to int.

Also, you probably want 0x800000000 to be an unsigned long.

You may find that on (that version of ) Linux that int is 64bit, whereas on (that version of) Windo3ws, int is 32bits.

score 1 · Answer 3 · answered Jun 05 '17 at 12:00

Don't do this:

#ifdef __GCC__

because a compiler switch might change the way things work. Better to do something like:

In some header somewhere:

#ifdef __GCC__
#define FEATURE_SHIFT_RIGHT_24
#endif
#ifdef __MSVC__
#define FEATURE_SHIFT_RIGHT_1
#endif

Then in your main code:

#ifdef FEATURE_SHIFT_RIGHT_24
next_7_bits = myint >> 24;
#endif
#ifdef FEATURE_SHIFT_RIGHT_1
   if(msb_is_set )
     next_7_bits = myint >> 1;
#endif

Your code should handle the implementation details, and the header should check which implementation is required by which compiler.

This separates out the code required from detecting which method is required for this compiler. In your header you can do more complex detection of compiler features.

e.g.

#ifdef __GCC__ && __GCCVERION__ > 1.23

etc

score 1 · Answer 4 · 2017-06-09T11:02:41.417

1

This is about your update. Although I'm not sure what your intention is, let's first point out some mistakes:

#ifdef WIN32

The macro always defined when targeting win32 is _WIN32, not WIN32.

Then you have another #ifdef checking for GCC, but this will not do what you expect: GCC also exists on win32 and it uses the same data model as MSVC. IOW, you can have both defined, __GCC__ and _WIN32.

You say you want to know whether the MSB is set. Then just make sure to convert your string to an unsigned int and directly check this bit:

#include <limits.h>
// [...]
unsigned int myint = strtoul(ascii, NULL, 16); // <- strtoul(), not strtol()!
unsigned int msb = 1U << (sizeof(unsigned int) * CHAR_BIT - 1);
if (myint & msb)
{
    // msb is set
}

Btw, see this answer for a really portable way to get the number of bits in an integer type. sizeof() * CHAR_BIT will fail on a platform with padding bits.

edited Jun 09 '17 at 11:02

answered Jun 05 '17 at 12:06

`unsigned long` would be a better choice to hold the result of `strtoul` – M.M Jun 05 '17 at 12:08
@M.M it would, but OP wants to know wheter the MSB of an unsigned int is set, at least if I understand him correctly. – Jun 05 '17 at 12:09
OK. I read it as asking about bit 31 specifically (int may not be 32-bit) – M.M Jun 05 '17 at 12:31
@M.M unfortunately it's unclear, but to get specifically bit 31 would be even simpler in code... – Jun 05 '17 at 12:32

Casting hex string to signed int results in different values in different platforms

4 Answers4