6

ungetc() seems to fail on some characters. Here is a simple test program:

#include <stdio.h>

int main(void) {
    int c;

    printf("Type a letter and the enter key: ");

#define TRACE(x)  printf("%s -> %d\n", #x, x)
    TRACE(c = getc(stdin));
    TRACE(ungetc(c, stdin));
    TRACE(getc(stdin));

    TRACE(ungetc('\xFE', stdin));
    TRACE(getc(stdin));

    TRACE(ungetc('\xFF', stdin));
    TRACE(getc(stdin));

    return 0;
}

I run it on a unix system and type a Enter at the prompt

The output is:

Type a letter and the enter key: a
c = getc(stdin) -> 97
ungetc(c, stdin) -> 97
getc(stdin) -> 97
ungetc('\xFE', stdin) -> 254
getc(stdin) -> 254
ungetc('\xFF', stdin) -> -1
getc(stdin) -> 10

I expected this:

Type a letter and the enter key: a
c = getc(stdin) -> 97
ungetc(c, stdin) -> 97
getc(stdin) -> 97
ungetc('\xFE', stdin) -> 254
getc(stdin) -> 254
ungetc('\xFF', stdin) -> 255
getc(stdin) -> 255

Why is causing ungetc() to fail?

EDIT: to make things worse, I tested the same code on a different unix system, and it behaves as expected there. Is there some kind of undefined behavior?

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • You're doing `ungetc(EOF)`. change to `255` as argument – M.M Jun 14 '18 at 23:30
  • @M.M: good guess! let's see who catches it and writes up a complete answer... – chqrlie Jun 14 '18 at 23:31
  • @JonathanLeffler It's implementation-defined and the evidence suggests that it is on OP's system – M.M Jun 14 '18 at 23:35
  • 3
    `\xFF` here is implicitly converted to `int`, the first parameter type of `ungetc`. When that `char` value is converted to `int`, the result *might* be negative and *might* be equal to `EOF`, and it appears that's the case for this system. – aschepler Jun 14 '18 at 23:37
  • 1
    @aschepler: `'\xFF'` is not converted to `int`, character constants have type `int` in C. – chqrlie Jun 14 '18 at 23:38
  • 2
    @chqrlie Oops, right. – aschepler Jun 14 '18 at 23:40
  • So we have an unsigned char constant expression being converted to whatever type is native char and then it's sign extended to signed integer when it's passed to `ungetc`. Just use 0x00FF and see what happens. – jwdonahue Jun 14 '18 at 23:49
  • @JonathanLeffler And then paragraph 10 says "If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type `char` whose value is that of the single character or escape sequence is converted to type `int`." This is pretty strange. – aschepler Jun 14 '18 at 23:50
  • 1
    @JonathanLeffler: that's correct, `0xFF` has a value of `255` which is in the range of type `unsigned char`, so it is a single byte. paragraph 10 further specifies that *If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.* – chqrlie Jun 14 '18 at 23:51
  • @jwdonahue there are no "unsigned char constant expression"s in C – M.M Jun 15 '18 at 00:04
  • 2
    @jwdonahue: if you pass `0xFF` to `ungetc()`, it behaves correctly. The pitfall here is that `'\xFF'` does not necessarily have the same value as `0xFF`. – chqrlie Jun 15 '18 at 00:11
  • @chqrlie, I thought that's essentially what I said. – jwdonahue Jun 15 '18 at 00:17
  • 2
    It looks like one needs to cast characters explicitly to `unsigned char` before supplying them to `isprint()`, `ungetc()`, etc., to avoid being bitten by integer promotion on architectures with signed `char` type. I need to remember this.. – Nominal Animal Jun 15 '18 at 01:20

1 Answers1

4

Working on the following assumptions:

  • You're on a system where plain char is signed.
  • '\xFF' is -1 on your system (the value of out-of-range character constants is implementation-defined, see below).
  • EOF is -1 on your system.

The call ungetc('\xFF', stdin); is the same as ungetc(EOF, stdin); whose behaviour is covered by C11 7.21.7.10/4:

If the value of c equals that of the macro EOF, the operation fails and the input stream is unchanged.


The input range for ungetc is the same as the output range of getchar, i.e. EOF which is negative, or a non-negative value representing a character (with negative characters being represented by their conversion to unsigned char). I presume you were going for ungetc(255, stdin);.


Regarding the value of '\xFF', see C11 6.4.4.4/10:

The value of an integer character constant [...] containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined.

Also, the values of the execution character set are implementation-defined (C11 5.2.1/1). You could check the compiler documentation to be sure, but the compiler behaviour suggests that 255 is not in the execution character set; and in fact the behaviour of a gcc version I tested suggests that it takes the range of char as the execution character set (not the range of unsigned char).

M.M
  • 138,810
  • 21
  • 208
  • 365
  • 1
    Or `ungetc((unsigned char)'\xFF', stdin);` – aschepler Jun 14 '18 at 23:35
  • 1
    Very good, except the character constant `'\xff'` does not seem to be out of range (it specifies a single byte) and C11 7.21.7.1 says: *The ungetc function pushes the character specified by c (converted to an unsigned char) back onto the input stream pointed to by stream.* This is so confusing. – chqrlie Jun 14 '18 at 23:36
  • @chqrlie try `printf("%d\n", '\xFF');` – M.M Jun 14 '18 at 23:38
  • @M.M: it will print `-1` or `255` depending on whether `char` is signed or unsigned by default for the target platform. This behavior is actually fully specified in C11 6.4.4.4 p10: *If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type `char `whose value is that of the single character or escape sequence is converted to type `int`.* – chqrlie Jun 14 '18 at 23:47
  • @chqrlie it's implementation-defined – M.M Jun 14 '18 at 23:48
  • @M.M: it is indeed implementation-defined by it must be one or the other don't you agree? – chqrlie Jun 14 '18 at 23:52
  • As a matter of fact, C11 6.4.4.4 has a non normative footnote for this very case: *13. EXAMPLE 2 Consider implementations that use two’s complement representation for integers and eight bits for objects that have type char. In an implementation in which type char has the same range of values as signed char, the integer character constant '\xFF' has the value −1; if type char has the same range of values as unsigned char, the character constant '\xFF' has the value +255.* – chqrlie Jun 14 '18 at 23:55
  • It could be anything so long as the implementation documents it. . While conforming with the standard that is; any system that didn't produce `-1` or `255` would not be very practical since there is a lot of existing code that relies on that behaviour. – M.M Jun 14 '18 at 23:55
  • The issue of `'\xFF'` could be a separate language-lawyer question. The footnote you quote also seems to assume that `-1` is actually in the execution character set on those systems, which I don't see a guarantee for in the standard (but perhaps it is implied by some other combination of requirements) – M.M Jun 15 '18 at 00:00
  • The *mapping* issue is indeed troublesome, especially when the character values are specified numerically in octal or hexadecimal. It still remains inconsistent and error prone that `ungetc()` can be passed all values of type `char` except one. – chqrlie Jun 15 '18 at 00:08
  • @chqrlie that's an unavoidable fact of having `EOF` be `-1`. If you use the function correctly (i.e. pass a non-negative value) then the inconsistency doesn't arise. Many functions in the standard library (including ctype) take input characters in this way – M.M Jun 15 '18 at 00:11
  • 1
    Maybe the standard could have mandated `EOF < SCHAR_MIN` but it's too late to fix that now – M.M Jun 15 '18 at 00:13
  • Unlike `ungetc`, the functions from `` are explicitly undefined for negative `char` values different from `EOF`. Ironically, `ungetc` is defined for all `char` value different from `EOF`, which is not necessarily `-1`. Conclusion: always cast `char` values as `unsigned char` for both `ungetc()` and functions from ``. – chqrlie Jun 15 '18 at 00:15
  • `'\xFF'` isn't a `char` value so that conclusion needs updating :) My advice would be to avoid using the `'\xNN'` form entirely; `0xNN` is simpler and well-defined. I would only use `\xFF` inside a string literal, and trust that the compiler isn't insane – M.M Jun 15 '18 at 00:17
  • @M.M: Demanding that `EOF < SCHAR_MIN` would have broken the standard. C89/C90 was fabulously successful because it did not break backwards compatibility gratuitously — and demanding `EOF < SCHAR_MIN` would have broken a lot of backwards compatibility. – Jonathan Leffler Jun 15 '18 at 00:21
  • @JonathanLeffler: At least there is an easy way to ensure that `EOF < CHAR_MIN` on many systems: configuring the compiler to treat `char` as unsigned by default, `-funsigned-char` for `gcc` and `clang`. This would break old software too, that assumes `char` to be signed, but this assumption is non portable anyway. – chqrlie Jun 15 '18 at 00:23
  • Also I am not sure if using `-funsigned-char` may break ABI compatibility, although in the past I have been able to toggle it and still successfully link against static libraries built the other way – M.M Jun 15 '18 at 00:26