Alphabet character prints wrong ISO code

Question

So, i have this simple code,

    #include <stdio.h>
    #include <stdlib.h>

    int main()
    {
        char c;
        c = getchar();
        printf("%d",c);
        return 0;
    }

Now let's say c = 'α' - a in greek alphabet. According to ISO8859-7 , the program should print 225,but instead it prints -31. Does anybody know what causes this mistake?

`char c;` is signed on your system. Just use `unsigned char c` — Jean-François Fabre, Feb 10 '18 at 21:09
`getchar` returns `int` for a reason. Why not just use `int c;`? — melpomene, Feb 10 '18 at 21:10
I know that. But unsigned char also works (and consumes less memory :)) — Jean-François Fabre, Feb 10 '18 at 21:12
@Jean-FrançoisFabre `unsigned char` does *not* work if you're interested in detecting `EOF`. — Steve Summit, Feb 10 '18 at 21:12
@SteveSummit of course. unsigned char is a bad idea in general. I tend to forget about this whole EOF thing. — Jean-François Fabre, Feb 10 '18 at 21:13

chux - Reinstate Monica · Answer 1 · 2018-02-10T21:43:23.087

getchar() and friends return an int with a value in the unsigned char range or EOF. EOF is a negative value. @melpomene

Use int.

int main() {
    // char c;
    int c;
    c = getchar();
    printf("%d\n",c);
    return 0;
}

Does anybody know what causes this mistake?

getchar() returned a value of 225, yet code assigned that to a char, which is signed on OP's platform with a range of -128 to 127. This invokes implementation defined behavior.

Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised. C11dr §6.3.1.3 3

A common implementation defined behavior is to decrease the value by 256 to -31. Other results are possible.

Jean-François Fabre · Answer 2 · 2018-02-10T21:39:21.613

1

Because char c; is signed on your system.

getchar returns an integer, which overflows your char: 225-256 = -31

Just use unsigned char c; instead, or more simply int, which consumes slightly more memory (shouldn't be an issue) but is able to distinguish EOF from 255. Go for int: it's simple and noone will wonder about that.

edited Feb 10 '18 at 21:39

answered Feb 10 '18 at 21:11

Jean-François Fabre

137,073
23
153
219

"Consumes more memory"? Oh crap, 3 more bytes? That's almost 0.00000002% of the RAM on this machine. :-) – melpomene Feb 10 '18 at 21:25
depending on compilers you can get 8-byte ints and 0.00000004% of the RAM. That's how bloated programs start :) but you're right. – Jean-François Fabre Feb 10 '18 at 21:26

score 0 · Answer 3 · answered Feb 12 '18 at 01:11

This isn't an answer to the question, but I want to address (pun intended) the increased memory usage of an int vs. a char variable as discussed in other comments and answers, and a bit of code formatting will help with this.

When you're talking about memory use, the best policy is to be prepared for surprises.

A local variable like the one in the question will often occupy zero bytes of memory regardless of its size, if it is stored in a register instead of memory. However, it may take more code to convert the data width, as is the case here.

For comparison, here is the compiled code from VS2017 in x86 release mode, first with an int variable:

                     int c;
                     c = getchar();
FF 15 B0 20 40 00    call        dword ptr [__imp__getchar (04020B0h)]  
                     printf("%d",c);
50                   push        eax  
68 F8 20 40 00       push        offset string "%d" (04020F8h)  
E8 1F 00 00 00       call        printf (0401030h)  
83 C4 08             add         esp,8

And with a char:

                     char c;
                     c = getchar();
FF 15 B0 20 40 00    call        dword ptr [__imp__getchar (04020B0h)]  
                     printf("%d",c);
0F BE C0             movsx       eax,al  ;; Widen 'char' to 'int'
50                   push        eax  
68 F8 20 40 00       push        offset string "%d" (04020F8h)  
E8 1C 00 00 00       call        printf (0401030h)  
83 C4 08             add         esp,8

The generated code is identical, except that the char version has an extra three-byte instruction, the movsx eax, al to widen the char to an int before pushing it. So instead of saving memory, the char used three more bytes of code.

Of course in a simple test like this, you don't care how much code or data memory is used - in fact you may not ever do an optimized build at all, only a debug build.

And things may well change in a more complex piece of code. For example, if you have an array of char vs. int, obviously the array will take more memory for the int values, assuming it isn't a very small array that ends up in registers.

Even for data that does end up in registers, a shorter data type may help because it lets more data get packed into the registers (e.g. by using things like the bl and bh registers for byte values), so less data spills out into actual memory.

But you really don't know until you look at the size of the generated code and how it uses memory.

In any case, this is all a moot point for the code in the question, since it just isn't correct to use a char or unsigned char as the return value for getchar(). But it is interesting to look at what kind of code gets generated for the different data types.

Alphabet character prints wrong ISO code

3 Answers3