6

ungetc is only guaranteed to take one byte of pushback. On the other hand, I've tested it on Windows and Linux and it seems to work with two bytes.

Are there any platforms (e.g. any current Unix systems) on which it actually only takes one byte?

Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
rwallace
  • 31,405
  • 40
  • 123
  • 242
  • 1
    Did you test with only glibc or did you also test with [klibc](http://en.wikipedia.org/wiki/Klibc), [dietlibc](http://en.wikipedia.org/wiki/Dietlibc), and [uClibc](http://en.wikipedia.org/wiki/Uclibc)? – sarnold Oct 18 '11 at 23:10
  • Only glibc. Would the answer be different for the others, do you know? – rwallace Oct 18 '11 at 23:19
  • 1
    I don't know, but because they aim to be simpler than glibc, I presume they'd only support a single character. I wonder how the BSDs and OS X handle pushback? – sarnold Oct 19 '11 at 00:31
  • 1
    Even if it worked when you tried it, that doesn't mean it always will -- what if it depends on the state of the stream's buffers? – Dmitri Oct 19 '11 at 03:54
  • 1
    Actually the simplest implementation is to support at least 2 characters of pushback, so that `scanf` can simply use `ungetc` for its pushback rather than requiring a separate mechanism. – R.. GitHub STOP HELPING ICE Oct 19 '11 at 04:43
  • Even scanf needs only one character to work. – Mark VY Jun 12 '15 at 09:20
  • 1
    Coming back to this years later, I now know that 1 is not quite enough for `scanf` to do a great job in all cases. In fact, even 2 is not enough. For reading integers, 1 is plenty. But suppose you want to read floating point numbers like `1.5e-9`. Now consider what happens when you get an input "number" like this: `1.5e-q`. Eventually scanf will read the `q` and think to itself "I thought this was a float in scientific notation, but it's not; I should stop here". It will un-get the `q` and "return" 1.5 to the caller. But the `e-` is gone forever, and ideally it should not be, I think. – Mark VY Jan 06 '23 at 22:10

3 Answers3

8

The C99 standard (and the C89 standard before that) said unequivocally:

One character of pushback is guaranteed. If the ungetc function is called too many times on the same stream without an intervening read or file positioning operation on that stream, the operation may fail.

So, to be portable, you do not assume more than one character of pushback.

Having said that, on both MacOS X 10.7.2 (Lion) and RHEL 5 (Linux, x86/64), I tried:

#include <stdio.h>
int main(void)
{
    int i;
    for (i = 0; i < 4096; i++)
    {
        int c = i % 16 + 64;
        if (ungetc(c, stdin) != c)
        {
            fprintf(stderr, "Error at count = %d\n", i);
            return(1);
        }
    }
    printf("No error up to count = %d\n", i-1);
    return(0);
}

I got no error on either platform. By contrast, on Solaris 10 (SPARC), I got an error at 'count = 4'. Worse, on HP-UX 11.00 (PA-RISC) and HP-UX 11.23 (Itanium), I got an error at 'count = 1' - belying the theory that 2 is safe. Similarly, AIX 6.0 gave an error at 'count = 1'.

Summary

  • Linux: big (4 KiB)
  • MaxOS X: big (4 KiB)
  • Solaris: 4
  • HP-UX: 1
  • AIX: 1

So, AIX and HP-UX only allow one character of pushback on an input file that has not had any data read on it. This is a nasty case; they might provide much more pushback capacity once some data has been read from the file (but a simple test on AIX adding a getchar() before the loop didn't change the pushback capacity).

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 2
    Note that exceeding the pushback limit results in failure rather than UB - a pleasant surprise for programmers familiar with C's usual way of handling non-portable code. Thus in principle you could always *try* pushing back 2 characters and fall back to a more expensive solution (like `fseek`, if your file is seekable) if the second one returns failure. – R.. GitHub STOP HELPING ICE Oct 19 '11 at 14:48
5

There are some posts here suggesting that it makes sense to support 2 chars for the sake of scanf.

I don't think this is right: scanf only needs one, and this is indeed the reason for the limit. The original implementation (back in the mid 70s) supported 100, and the manual had a note: in the future we may decide to support only 1, since that's all that scanf needs. See page 3 of the original manual (Maybe not original, but pretty old.)

To see more vividly that scanf needs only 1 char, consider this code for the %u feature of scanf.

int c;
while isspace(c=getc()) {} // skip white space
unsigned num = 0;
while isdigit(c)
    num = num*10 + c-'0',
    c = getc();
ungetc(c);

Only a single call to ungetc() is needed here. There is no reason why scanf needs a char all to itself: it can share with the user.

Mark VY
  • 1,489
  • 16
  • 31
  • The standard explicitly documents that `scanf()` only requires one character of pushback. See C11 [§7.21.6.2 The `fscanf()` function — footnote 285](http://port70.net/~nsz/c/c11/n1570.html#note285): _`fscanf` pushes back at most one input character onto the input stream. Therefore, some sequences that are acceptable to `strtod`, `strtol`, etc., are unacceptable to `fscanf`._ – Jonathan Leffler Jan 06 '23 at 21:02
3

Implementations which support 2 characters of pushback probably do so in order than scanf can use ungetc for its pushback rather than requiring a second nearly-identical mechanism. What this means for you as the application programmer is that even if calling ungetc twice seems to work, it might not be reliable in all situations -- for example, if the last operation on the stream was fscanf and it had to use pushback, you can probably only ungetc one character.

In any case, it's nonportable to rely on having more than one character of ungetc pushback, so I would highly advise against writing code that needs it...

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • Note that `scanf()` only requires one character of pushback. See C11 [§7.21.6.2 The `fscanf()` function — footnote 285](http://port70.net/~nsz/c/c11/n1570.html#note285): _`fscanf` pushes back at most one input character onto the input stream. Therefore, some sequences that are acceptable to `strtod`, `strtol`, etc., are unacceptable to `fscanf`._ – Jonathan Leffler Jan 06 '23 at 20:57
  • @JonathanLeffler: Your comment is correct but I'm unclear how it's supposed to relate to my answer. Yes `scanf` can only push back one character but after that you're still allowed to call `ungetc`, producing two characters of pushback total. – R.. GitHub STOP HELPING ICE Jan 07 '23 at 07:00
  • I found this note in the [C99 Rationale](https://open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf) section 7.19.6.2 "The `fscanf` function" (p154): _An implementation must not use the `ungetc` function to perform the necessary one-character pushback. In particular, since the unmatched text is left “unread,” the file position indicator as reported by the `ftell` function must be the position of the character remaining to be read._ […continued…] – Jonathan Leffler Jan 20 '23 at 17:46
  • […continuation…] _Furthermore, if the unread characters were themselves pushed back via `ungetc`, the pushback in `fscanf` could not affect the pushback stack in `ungetc`. A `scanf` call that matches N characters from a stream must leave the stream in the same state as if N consecutive `getc` calls had been made._ – Jonathan Leffler Jan 20 '23 at 17:47
  • @JonathanLeffler: I still don't understand how you think this is related to my answer. – R.. GitHub STOP HELPING ICE Jan 24 '23 at 09:00