2

I have this fragment of code that reads arithmetic expressions like 1 + 2 * 3 into integers and characters:

int main() {
    int d, flag = 0;
    char c;

    while ((flag = scanf("%d", &d)) != EOF)
    {
        if (flag == 1) // if integer was read sucessfully
        {
            // an integer has been read into variable 'd'
            printf("%d,", d);
        }
        else // else if it was a character
        {
            // then read char into variable 'c'
            c = getchar();
            printf("%c,", c);
        }
    }
}

Now this code works when compiled with MingGW on Windows and MacOS, but somehow on Linux, it characters + and - are not read correctly.

Sample run:
Input: 1 * 2 * 3
Output: 1,*,2,*,3,

Input: 1 + 2 - 3
Output: 1, ,2, ,3,

Input: 1 ++ 2 -- 3
Output: 1,+,2,-,3

Somehow the + and - characters are read as spaces. But it works if we put double ++ and --. Again this only happens when compiled on Linux, and on almost all online IDEs. It's puzzling why only the + and - character? Could it be they are recognised as positive and negative signs?

Biffen
  • 6,249
  • 6
  • 28
  • 36
welcomb
  • 79
  • 1
  • 5
  • 1
    because scanf with %d delim will skip + and -, why not just use getchar – JoshKisb Mar 29 '18 at 07:43
  • `char` and `int` is the same thing, to be safer and to be more compatible, please read `char` as `int`. – Yves Mar 29 '18 at 07:47
  • for me same behaviour (strange) on mingw. – Jean-François Fabre Mar 29 '18 at 08:00
  • 2
    Because `%d` reads ".. any number of decimal digits (0-9), **optionally preceded by a sign (+ or -)**." (reference: http://www.cplusplus.com/reference/cstdio/scanf/) – Jongware Mar 29 '18 at 08:17
  • that doesn't explain the spaces. – Jean-François Fabre Mar 29 '18 at 08:21
  • 1
    As in all the other millions of questions about why scanf and getchar aren't working correctly, you need to discard the line feed character from stdin. https://stackoverflow.com/questions/35178520/how-to-read-parse-input-in-c-the-faq – Lundin Mar 29 '18 at 08:24
  • @Bob__ same here :) – Jean-François Fabre Mar 29 '18 at 08:43
  • From what I can tell, the OP is describing the effects of multiple compounding issues. First, +/-N should always be interpreted as an integer, and second, the `getchar()` call is probably gobbling some of the characters and possibly masking behavior. Too late at night for me to think all the way through, but the answers so far are weakly worded and probably not 100% correct. – jwdonahue Mar 29 '18 at 08:44

3 Answers3

6

When you enter

1 + 2

The first number is scanned. When scanf tries to scan the second number, it starts by scanning + (which is a valid start for a number with unary +), but after + stumbles on [space]: failure

[space] isn't consumed, but + is, even if the scan failed. Which explains why the + and - chars alone are consumed but not seen.

With *, * isn't consumed because a number cannot start by *

The fact that you have something different on MinGW is a mystery to me, I'm on MinGW and I get the "wrong" behaviour you described. BUT my hypothesis is that the standard library that you're using is "smarter" than standard implementations, and puts back the + or - when it finds it, so it can be properly read by getchar afterwards.

I suggest that you try compiling your code using -D__USE_MINGW_ANSI_STDIO=1 to make sure that gcc doesn't use Microsoft scanf implementation, and you should get the "buggy" behaviour again (I'm not sure that there's a standard for parsing botched numbers BTW)

Your scanf approach is indeed doomed because:

  • of the cornercases you can encounter like above
  • of the fact that if you enter 1 +2 (without space), then the sign won't be read either, because it's part of the second number.

The best way here is to use a custom lexer reading char by char.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • 2
    I don't know who downvoted, but this answer is correct and valid. When `+/-` are read with the `%d` *conversion specifier*, the conversion to integer begins, then a *matching failure* occurs and no more characters are read. I would like to see the reasoning behind the downvote if you have the integrity to leave it. – David C. Rankin Mar 29 '18 at 08:31
  • I'd also like to know why the downvote. Because I'm positive about the why. Thought of that for at least 30 minutes. I'm not into revenge downvotes or don't hold grudges, so talk freely. – Jean-François Fabre Mar 29 '18 at 08:33
  • I down-voted this answer because it's a mess and doesn't actually fully address the problem, as-in, works on one compiler but not the other. – jwdonahue Mar 29 '18 at 08:37
  • fair enough. What about now? – Jean-François Fabre Mar 29 '18 at 08:41
  • 1
    @jwdonahue thank you for leaving your reason. I appreciate the candor, especially when the technical crux of the answer is correct. Also, in such instances, it is always good policy to point out what you feel are deficiencies and give the author a chance to correct. People go out of their way to help here, and we should do the same when reviewing answers. Save the dings for those that are just wrong and can't be fixed. – David C. Rankin Mar 29 '18 at 08:42
  • it's not a compiler issue, it's a standard library issue. On MinGW, gcc uses Windows standard library by default. And we know that Microsoft does about the standards.... but is there even a standard for the behaviour when reading "+" – Jean-François Fabre Mar 29 '18 at 08:44
  • We may be stumbling into undefined behavior here. +digit could be interpreted as positive integer or random '+' in the string. This answer contains a partial duplicate of itself and needs to be cleaned up. I can't deal with C standards this late at night so I bid you all goodnight. – jwdonahue Mar 29 '18 at 08:48
  • Different interpretations of the standards is not uncommon for different library implementers: https://stackoverflow.com/q/24689378/4944425 . BTW, please edit the answer to remove the duplicated first part. – Bob__ Mar 29 '18 at 08:49
  • You might be right about the MingGW cos I don't think I'm running a clean install. I'm not too concerned about spaces because we can enforce space between tokens in the input. – welcomb Mar 30 '18 at 07:56
5

Jean-François Fabre correctly explained what actually happens, but my opinion is that it is simply unspecified by the standard what should happen in that case.

Draft n1570 for C11 says at 7.21.6.2 The fscanf function

12 The conversion specifiers and their meanings are:
d Matches an optionally signed decimal integer, whose format is the same as expected for the subject sequence of the strtol function with the value 10 for the base argument. The corresponding argument shall be a pointer to signed integer.
...

and strtol is described in 7.22.1.4 The strtol, strtoll, strtoul, and strtoull functions (emphasize mine)

Description
2 The strtol, strtoll, strtoul, and strtoull functions convert the initial portion of the string pointed to by nptr to long int, long long int, unsigned long int, and unsigned long long int representation, respectively. First, they decompose the input string into three parts: an initial, possibly empty, sequence of white-space characters (as specified by the isspace function), a subject sequence resembling an integer represented in some radix determined by the value of base, and a final string of one or more unrecognized characters, including the terminating null character of the input string. Then, they attempt to convert the subject sequence to an integer, and return the result.

I could not find anywhere in the standard what exactly could resemble a decimal integer. It is clear that positive and negative number do, and that numbers prefixed with a plus sign (+) also do. But it is not specified whether the plus and minus signs (+ and -) alone do resemble a decimal integer or not.

If an implementation decides that they do, a %d specifier will eat alone + and - signs, if it decides that they do not, it will leave them in the stream.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
-1

'+2' was scanned to 'd' because of scanf("%d", &d)). You can try with '1-2' and see that '-' also can't display.

kid1412hv
  • 1
  • 1