Unexpected behavior when printing 4-byte integer byte by byte

Question

I have this sample code for converting 32 bit integers to ip addresses.


#include <stdio.h>
int main()
{
 unsigned int c ;
 unsigned char* cptr  = (unsigned char*)&c ;
 while(1)
 {
  scanf("%d",&c) ;
  printf("Integer value: %u\n",c);
  printf("%u.%u.%u.%u \n",*cptr, *(cptr+1), *(cptr+2), *(cptr+3) );
 }
}

This code gives incorrect output for input 2249459722 . But when i replace

scanf("%d",&c) ;

by

scanf("%u",&c) ;

The output comes out to be correct.

P.S : I know about inet_ntop and inet_pton.
I expect answers other than suggesting those.

Rotten title. It doesn't tell the reader *anything* about the problem you are experiencing. — dmckee --- ex-moderator kitten, Jan 09 '10 at 23:21
`2249459722` is not required to fit in an `int` or an `unsigned int` for that matter. Use `unsigned long`, or if you have it available, `uint32_t`. The format for `scanf()` becomes `"%lu"`/`"%" SCNu32` in that case. — Alok Singhal, Jan 09 '10 at 23:30

Jonathan Leffler · Accepted Answer · 2010-01-10T00:34:00.243

You are coding 'sinfully' (making a number of mistakes which will hurt you sooner or later - mostly sooner). First off, you are assuming that the integer is of the correct endian-ness. On some machines, you will be wrong - either on Intel machines or on PowerPC or SPARC machines.

In general, you should show the actual results you get rather than just saying that you get the wrong result; you should also show the expected result. That helps people debug your expectations.

Here's my modified version of your code - instead of requesting input, it simply assumes the value you specified.

#include <stdio.h>
int main(void)
{
    unsigned int c = 2249459722;
    unsigned char* cptr  = (unsigned char*)&c;
    printf("Integer value:  %10u\n", c);
    printf("Integer value:  0x%08X\n", c);
    printf("Dotted decimal: %u.%u.%u.%u \n", *cptr, *(cptr+1), *(cptr+2), *(cptr+3));
    return(0);
}

When compiled on my Mac (Intel, little-endian), the output is:

Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134

When compiled on my Sun (SPARC, big-endian), the output is:

Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 134.20.8.10

(Using GCC 4.4.2 on the SPARC, I get a warning:

xx.c:4: warning: this decimal constant is unsigned only in ISO C90

Using GCC 4.2.1 on Mac - with lots of warnings enabled (gcc -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes -Werror) - I don't get that warning, which is interesting.) I can remove that by adding a U suffix to the integer constant.

Another way of looking at the problems is illustrated with the following code and the extremely fussy compiler settings shown above:

#include <stdio.h>

static void print_value(unsigned int c)
{
    unsigned char* cptr  = (unsigned char*)&c;
    printf("Integer value:  %10u\n", c);
    printf("Integer value:  0x%08X\n", c);
    printf("Dotted decimal: %u.%u.%u.%u \n", *cptr, *(cptr+1), *(cptr+2), *(cptr+3));
}

int main(void)
{
    const char str[] = "2249459722";
    unsigned int c = 2249459722;

    printf("Direct operations:\n");
    print_value(c);

    printf("Indirect operations:\n");
    if (sscanf("2249559722", "%d", &c) != 0)
        printf("Conversion failed for %s\n", str);
    else
        print_value(c);
    return(0);
}

This fails to compile (because of the -Werror setting) with the message:

cc1: warnings being treated as errors
xx.c: In function ‘main’:
xx.c:20: warning: format ‘%d’ expects type ‘int *’, but argument 3 has type ‘unsigned int *’

Remove the -Werror setting and it compiles, but then shows the next problem that you have - the one of not checking for error indications from functions that can fail:

Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations:
Conversion failed for 2249459722

Basically, the sscanf() function reports that it failed to convert the string to a signed integer (because the value is too large to fit - see the warning from GCC 4.4.2), but your code was not checking for the error return from sscanf(), so you were using whatever value happened to be left in c at the time.

So, there are multiple problems with your code:

It assumes a particular architecture (little-endian rather than recognizing that big-endian also exists).
It doesn't compile cleanly when using a compiler with lots of warnings enabled - for good reason.
It doesn't check that functions that can fail actually succeeded.

Alok's Comment

Yes, the test on sscanf() is wrong. That's why you have code reviews, and also why it helps to post the code you are testing.

I'm now a bit puzzled - getting consistent behaviour that I can't immediately explain. With the obvious revision (testing on MacOS X 10.6.2, GCC 4.2.1, 32-bit and 64-bit compilations), I get one not very sane answer. When I rewrite more modularly, I get a sane answer.

+ cat yy.c
#include <stdio.h>

static void print_value(unsigned int c)
{
    unsigned char* cptr  = (unsigned char*)&c;
    printf("Integer value:  %10u\n", c);
    printf("Integer value:  0x%08X\n", c);
    printf("Dotted decimal: %u.%u.%u.%u \n", *cptr, *(cptr+1), *(cptr+2), *(cptr+3));
}

int main(void)
{
    const char str[] = "2249459722";
    unsigned int c = 2249459722;

    printf("Direct operations:\n");
    print_value(c);

    printf("Indirect operations:\n");
    if (sscanf("2249559722", "%d", &c) != 1)
        printf("Conversion failed for %s\n", str);
    else
        print_value(c);
    return(0);
}


+ gcc -o yy.32 -m32 -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes yy.c
yy.c: In function ‘main’:
yy.c:20: warning: format ‘%d’ expects type ‘int *’, but argument 3 has type ‘unsigned int *’


+ ./yy.32
Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations:
Integer value:  2249559722
Integer value:  0x86158EAA
Dotted decimal: 170.142.21.134

I do not have a good explanation for the value 170.142.21.134; but it is consistent on my machine, at the moment.

+ gcc -o yy.64 -m64 -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes yy.c
yy.c: In function ‘main’:
yy.c:20: warning: format ‘%d’ expects type ‘int *’, but argument 3 has type ‘unsigned int *’


+ ./yy.64
Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations:
Integer value:  2249559722
Integer value:  0x86158EAA
Dotted decimal: 170.142.21.134

Same value - even in 64-bit instead of 32-bit. Maybe the problem is that I'm trying to explain undefined behaviour, which is more or less by definition unexplainable (inexplicable).

+ cat xx.c
#include <stdio.h>

static void print_value(unsigned int c)
{
    unsigned char* cptr  = (unsigned char*)&c;
    printf("Integer value:  %10u\n", c);
    printf("Integer value:  0x%08X\n", c);
    printf("Dotted decimal: %u.%u.%u.%u \n", *cptr, *(cptr+1), *(cptr+2), *(cptr+3));
}

static void scan_value(const char *str, const char *fmt, const char *tag)
{
    unsigned int c;
    printf("Indirect operations (%s):\n", tag);
    fmt = "%d";
    if (sscanf(str, fmt, &c) != 1)
        printf("Conversion failed for %s (format %s \"%s\")\n", str, tag, fmt);
    else
        print_value(c);
}

int main(void)
{
    const char str[] = "2249459722";
    unsigned int c = 2249459722U;

    printf("Direct operations:\n");
    print_value(c);
    scan_value(str, "%d", "signed");
    scan_value(str, "%u", "unsigned");

    return(0);
}

Using the function argument like this means GCC cannot spot the bogus format any more.

+ gcc -o xx.32 -m32 -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes xx.c


+ ./xx.32
Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations (signed):
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations (unsigned):
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134

The results are consistent here.

+ gcc -o xx.64 -m64 -std=c99 -pedantic -Wall -Wshadow -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes xx.c


+ ./xx.64
Direct operations:
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations (signed):
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134 
Indirect operations (unsigned):
Integer value:  2249459722
Integer value:  0x8614080A
Dotted decimal: 10.8.20.134

And these are the same as the 32-bit case. I'm officially bemused. The main observations remain accurate - be careful, heed compiler warnings (and elicit compiler warnings), and don't assume that "all the world runs on Intel chips" (it used to be "don't assume that all the world is a VAX", once upon a long time ago!).

Good point. I thought to mention that in my edit, but I didn't think it would be understood. — hobodave, Jan 09 '10 at 08:08
The cause for the "unsigned only in C90" warning is that the rules about what the type of an integer literal in C is between C90 and C99. In C90, the types "tried" are `int`, `long int` and `unsigned long int`. In C99, the types are `int`, `long int` and `long long int`. The constant `2249459722` is of type `unsigned long int` in C90 and `long long int` in C99 on your machine. Adding `U` makes it unsigned everywhere. — Alok Singhal, Jan 09 '10 at 23:59
Technically, the `printf()` functions can fail and should be checked too. However, it is unusual to do so in example code - whereas checking inputs such as the `sscanf()` (or `scanf()` in the original) is crucial, even in example code. — Jonathan Leffler, Jan 10 '10 at 00:00
Surely you meant `!= 1` and not `!= 0` in your `scanf()` call? — Alok Singhal, Jan 10 '10 at 00:03
@Jonathan: Can you tell me the size of `int`, `long int` and `long long int` on your Solaris machine? Just curious. — Alok Singhal, Jan 10 '10 at 00:40
@Alok: under 'gcc' and 'gcc -m32', it is 4/4/8; under 'gcc -m64', it is 4/8/8. I think that the makefile in the directory where I built the test program uses '-m64' by default. — Jonathan Leffler, Jan 10 '10 at 00:53
@Jonathan: You will hate me for it, but in your `yy.c`, your `sscanf` call scans `"2249559722"`, not `str`, which is `"2249459722"`. See the difference? 2249 **4** 59722 vs 2249 **5** 59722. Now? :-) — Alok Singhal, Jan 10 '10 at 04:47
@Alok: well spotted - and absolutely no hate involved! Indeed, it is a relief to know there's a rational reason for the issue. I'm not sure why the scanf() is not generating an error, though. I don't plan to amend my answer at the moment. And I did remember noting at some point that I was not using the string I'd created...but I didn't fix it. Oh well, ç'est la vie, comme on dit. — Jonathan Leffler, Jan 10 '10 at 07:52
`scanf()` doesn't generate an error in case of overflow. The behavior is undefined. So, `int c; sscanf("123456789012345678901234567890", "%d", &c);` will (may) return 1, and `c` may contain anything. — Alok Singhal, Jan 10 '10 at 16:24
@Alok: the dark corners of the C standard are indeed dark and twisty. Even the spec for strtol() et al is hard to use: if the 'subject string' value is too big, you get the appropriate min/max integer value (LONG_MIN for negative, LONG_MAX for positive) and errno set to ERANGE. I'd really rather have: `extern errno_t str_to_long(const char *str, char *end, int base, long *result);` with the return value being 0 for OK and ERANGE or other values on error (`errno_t` is defined in TR 24731-1 - see http://stackoverflow.com/questions/372980/ for more on that standard technical report). — Jonathan Leffler, Jan 10 '10 at 19:08

hobodave · Answer 2 · 2010-01-09T08:07:19.483

5

%d is for signed integers

%u is for unsigned integers

Edit:

Please modify your program as follows to see how your input is really interpreted:

#include <stdio.h>
int main()
{
 unsigned int c ; 
 unsigned char* cptr  = (unsigned char*)&c ;
 while(1)
 {
  scanf("%d",&c) ;
  printf("Signed value: %d\n",c);
  printf("Unsigned value: %u\n",c);
  printf("%u.%u.%u.%u \n",*cptr, *(cptr+1), *(cptr+2), *(cptr+3) );
 }
}

What happens when you supply a number greater than INT_MAX is the left-most bit is 1. This indicates that it is a signed integer with a negative value. The number is then interpreted as it's two's complement

edited Jan 09 '10 at 08:07

answered Jan 09 '10 at 07:55

hobodave

28,925
4
72
77

Also you might want to consider some research about 2's complement. – Erkan Haspulat Jan 09 '10 at 08:03
It could be interpreted as two's complement, or ones' complement, or whatever the underlying encoding is. Technically, `scanf()` could have done anything, since the behavior when the input doesn't fit in the data type is undefined. – Alok Singhal Jan 09 '10 at 23:49

score 1 · Answer 3 · answered Jan 09 '10 at 23:45

To answer your main question:

scanf("%d", &c);

scanf()'s behavior is undefined when the input being converted can't be represented to the data type. 2249459722 on your machine doesn't fit in an int, so scanf() can do anything, including storing garbage in c.

In C, int type is guaranteed to be able to store values in the range -32767 to +32767. An unsigned int is guaranteed values between 0 and 65535. So, as such, 2249459722 need not fit in even an unsigned int. unsigned long, however, can store values up to 4294967295 (2³²−1), so you should use unsigned long:

#include <stdio.h>
int main()
{
    unsigned long c ;
    unsigned char *cptr  = (unsigned char*)&c ;
    while(1)
    {
        if (scanf("%lu", &c) != 1) {
            fprintf(stderr, "error in scanf\n");
            return 0;
        }
        printf("Input value: %lu\n", c);
        printf("%u.%u.%u.%u\n", cptr[0], cptr[1], cptr[2], cptr[3]);
    }
    return 0;
}

If you have a C99 compiler, you can #include <inttypes.h> and then use uint32_t instead of unsigned long. The scanf() call becomes scanf("%" SCNu32, &c);

score 1 · Answer 4 · answered Jan 10 '10 at 17:27

1

The correct endianness-safe way to write this is

printf("Dotted decimal: %u.%u.%u.%u \n", (c >> 24) & 0xff, (c >> 16) & 0xff, (c >> 8) & 0xff, (c >> 0) & 0xff);

answered Jan 10 '10 at 17:27

starblue

55,348
14
97
151

Unexpected behavior when printing 4-byte integer byte by byte

4 Answers4

Alok's Comment

Edit:

Linked