20

The gets function was first deprecated in C99 and finally removed in C11. Yet there is no direct replacement for it in the C library.

fgets() is not a drop-in replacement because it does not strip the final '\n', which may be absent at the end of file. Many programmers get it wrong too.

There is a one-liner to remove the linefeed: buf[strcspn(buf, "\n")] = '\0';, but it is non-trivial and often calls for an explanation. It may be inefficient as well.

This is counter-productive. Many beginners still use gets() because their teachers are lame or their tutorials obsolete.

Microsoft proposed gets_s() and many related functions, but it does not silently truncate overlong lines, the behavior on this constraint violation is not exactly simple.

Both BSD and the GNU libc have getline, standardized in POSIX, that allocates or reallocates a buffer via realloc...

What is the best way to teach beginners about this mess?

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • 1
    POSIX `getline` is even more non-trivial than `fgets` . `scanf` with `%[` is another option, although it has its own pitfalls. `fgets` doesn't seem too bad to me, it has the advantage of being able to tell whether or not the line exceeded the buffer. – M.M Dec 01 '15 at 22:13
  • Use `fgets` and `fputs`, forbid `gets` and (for symmetry) `puts`, and just deal with `'\n'` and allocations? –  Dec 01 '15 at 22:14
  • `puts` is perfectly fine – M.M Dec 01 '15 at 22:14
  • 1
    @M.M Use `puts` and people will ask "but what's so bad about `gets` then?" Keep in mind that this question is *didactic* in nature, not *technical*. –  Dec 01 '15 at 22:15
  • 2
    That'd be the perfect opportunity to explain what is actually so bad about `gets` . Beginner C education has to include discussion of buffer overflows and the importance of not doing it. – M.M Dec 01 '15 at 22:16
  • 1
    @M.M: `scanf` is definitely not a decent replacement for `gets()`. The size limitation argument is off by one and must be specified explicitly in the format string, inelegance to its max! `scanf_s` is slightly better but unsupported in BSD and Linux, like `gets_s`... – chqrlie Dec 01 '15 at 22:20
  • The only way this isn't "primarily opinion-based" is if someone from the committee happens by and can quote part of a discussion from the minutes of a meeting or a thread. – Adrian McCarthy Dec 01 '15 at 22:53
  • 2
    OK, I believe committee members sometimes do read stackoverflow, and if they don't, evidence may be available as to why they did not provide a direct replacement or a recommended alternative. I'm also asking how to best teach beginners about this. – chqrlie Dec 01 '15 at 22:57
  • The best replacement for gets() is: `assert(1==0);` next question, please! – wildplasser Dec 01 '15 at 23:08
  • I saw `strtok(buf, "\n")` to remove the newline from `fgets` – bolov Dec 01 '15 at 23:09
  • @bolov: not a very good idea since `strtok()` is not reentrant. using `strcspn()` is a better solution. – chqrlie Dec 01 '15 at 23:15

3 Answers3

6

The nature of the question is such that there's going to be speculations and opinions. But we could find some information from the C99 rationale and C11 standard.

The C99 rationale, when gets() was deprecated, states the following reason for the deprecating it:

Because gets does not check for buffer overrun, it is generally unsafe to use when its input is not under the programmer’s control. This has caused some to question whether it should appear in the Standard at all. The Committee decided that gets was useful and convenient in those special circumstances when the programmer does have adequate control over the input, and as longstanding existing practice, it needed a standard specification. In general, however, the preferred function is fgets (see §7.19.7.2).

I don't think gets_s() can be considered as an alternative either. Because gets_s() is an optional interface. C11 actually recommends fgets() over gets_s():

§K.3.5.4.1, C11 draft

The fgets function allows properly-written programs to safely process input lines too long to store in the result array. In general this requires that callers of fgets pay attention to the presence or absence of a new-line character in the result array. Consider using fgets (along with any needed processing based on new-line characters) instead of gets_s.

So that leaves us with fgets() as the only real replacement for gets() in ISO C. fgets() is equivalent to gets() except it would read in the newline if there's buffer space. So is it worth introducing a new interface that has a minor improvement over a longstanding and widely used (fgets()) one? IMO, no.

Besides, a lot of real world applications are not restricted to ISO C alone. So there's an opportunity to use extensions and POSIX getline() etc as a replacement.

If it becomes necessary to find write a solution within ISO C, then it's quite easy to write a wrapper around fgets() anyway such as my_fgets() that would remove the newline, if present.

Of course, teaching fgets() to newcomers involves explaining about the potential newline issue. But IMO, it's not that hard to understand and someone intending to do learn C should be able to grasp it quickly. It (finding the last character and replace it if it's character "X") could even be considered as a good exercise for a beginner.

So in light of the above stated reasons, I would say there's no overwhelming necessity for a new function in ISO C as a true replacement for gets().

P.P
  • 117,907
  • 20
  • 175
  • 238
4

This question largely calls for speculation short of a citation from committee minutes or something, but as a general principle, the committee (WG14) generally avoids inventing new interfaces and prefers to document and make rigorous existing practice (things like snprintf, long long, the inttypes.h types, etc.) and sometimes adopt from other standards/interface definitions outside of C (e.g. complex math from IEEE floating point, atomic model from C++, etc.). gets has no such replacement to adopt, probably because fgets is generally considered superior (it's non-lossy when the file ends without a newline). If you really want a direct replacement, something like this works:

char buf[100];
scanf("%99[^\n]%*1[\n]", buf);

Of course it's klunky to use, especially when the buffer size is variable.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 6
    *the committee (WG14) generally avoids inventing new interfaces* are you kidding? They invented a slew of multicharacter interfaces of dubious value. – chqrlie Dec 01 '15 at 22:21
  • @chqrlie: There are exceptions, but they're the exception rather than the norm. I'm not entirely clear on what particular example you're talking about. Maybe `uchar.h`? – R.. GitHub STOP HELPING ICE Dec 01 '15 at 22:22
  • 3
    This code fails for an empty line (i.e. bare `'\n'` on stream): `%[` is a *matching failure* if no characters are matched, so it will not proceed to the next specifier. – M.M Dec 01 '15 at 22:24
  • Indeed, it does seem to fail on a blank line. Any ideas for fixing it? Surely you can just catch the matching failure then `getchar` but it's no longer a single call. Vs `fgets` it avoids a second `O(n)` scan of the string, though. – R.. GitHub STOP HELPING ICE Dec 01 '15 at 22:25
  • @M.M: good catch! as suspected, `scanf` is not a decent replacement. – chqrlie Dec 01 '15 at 22:25
  • Yeah, `scanf("%99[^\n], buf); scanf("%*1[\n]");` – M.M Dec 01 '15 at 22:26
  • 2
    @M.M: In that case you also need a separate operation to null-terminate the (unwritten) `buf`. – R.. GitHub STOP HELPING ICE Dec 01 '15 at 22:27
  • 1
    @R..: `uchar.h`, but also `wchar.h`, `wctype.h`... allowing `wchar_t` to be 16 bit, leading to unsurmountable shortcomings for real internationalization work, and then reversing this with `char32`, but keeping `char16` too... What a mess! – chqrlie Dec 01 '15 at 22:27
  • @M.M: how do you test for end of file then, `feof(stdin)`? (just teasing) – chqrlie Dec 01 '15 at 22:29
  • 2
    @chqrlie Worth pointing out that `wchar_t` can be 16 bit because Unicode was supposed to be 16 bit. – user253751 Dec 01 '15 at 23:07
  • @immibis: Unicode 1.0 was indeed published in 1991 and it soon became obvious that 16 bits would not suffice. UTF-8 was devised in 1993, but it took another 3 years for Unicode 2.0 to go beyond 16 bits, with the infamous surrogate scheme and the silly 0x10FFFF upper limit, precluding a nice round 20 bit range. We cannot change the past, but these choices still plague the C language and many others 20 years down the road. – chqrlie Dec 01 '15 at 23:35
  • 1
    @chqrlie: `wchar_t` is only permitted to be 16-bit if the implementation only supports UCS-2 or smaller. You cannot support all of unicode with 16-bit `wchar_t`. – R.. GitHub STOP HELPING ICE Dec 01 '15 at 23:44
  • @R..: Alas, platforms such as Microsoft Windows define `wchar_t` as 16 bits and use surrogate pairs for the non BMP code-points. – chqrlie Dec 01 '15 at 23:46
  • 2
    @chqrlie: That doesn't actually work. `mbrtowc` can only produce one `wchar_t`, and `wcrtomb` can only process one. The `uchar16_t` functions lift this API limitation so UTF-16 can actually be supported for them, but Windows is just buggy and non-BMP codepoints simply do not work with the C mb/wc API. (Of course they expect you to ignore the standard API and use WinAPI functions instead...) – R.. GitHub STOP HELPING ICE Dec 01 '15 at 23:56
  • @R..: that's the sad truth. The whole multibyte and wide character support specified in the standard seems too complex to use correctly for most programmers. Even the basic `` functions such as `isalpha()` are misused most of the time. Conversely, proper support of locale specific oddities usually requires special casing anyway. – chqrlie Dec 02 '15 at 01:15
2

IMO, any replacement would need to pass the sizeas well as the char * destination necessitating code changes that were significantly dependent on a case by case basis. A one-size-fits all was not deemed possible as the size is often lost/not passed by the time code gets to gets(). Given the we had a 12 year warning (C99 to C11), suspect the committee felt the problem would be gone by 2011.

Ha!

The Standard C committee should have made a replacement that also passed in the size of the destination. Like the following. (this likely has a name collision issue)

char *gets_replacement(char *s, size_t size);

I attempted a fgets() based replacement that takes advantage of VLA (optional in C11)

char *my_gets(char *dest, size_t size) {
  // +2 one for \n and 1 to detect overrun
  char buf[size + 2];

  if (fgets(buf, sizeof buf, stdin) == NULL) {
    // improve error handling - see below comment
    if (size > 0) {
      *buf = '\0';
    }
    return NULL;
  }
  size_t len = strlen(buf);
  if (len > 0 && buf[len - 1] == '\n') {
    buf[--len] = '\0';
  }

  // If input would have overrun the original gets()
  if (len >= size) {
    // or call error handler
    if (size > 0) {
      *buf = '\0';
    }
    return NULL;  
  }
  return memcpy(dest, buf, len + 1);
}
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • 2
    *Given the we had a 12 year warning (C99 to C11), suspect the committee felt the problem would be gone by 2011.* From our experience here on stackoverflow, the problems is very sticky. Beginners still use `gets`. – chqrlie Dec 01 '15 at 23:06
  • 1
    Your proposed implementation has the same shortcoming as `fgets`: buf is indeterminate upon read error or overlong input. I think it would be better to set `*buf = '\0';` in this case, provided `size > 0`. `fgets` leaves the buffer untouched upon EOF, but this specified behavior is more error prone than useful. – chqrlie Dec 01 '15 at 23:09
  • @chqrlie Agree. The "Ha!" was to indicate the my disagreement that such a line of reasoning, though hopeful in 1999, did not play out by 2011. – chux - Reinstate Monica Dec 01 '15 at 23:09
  • @chqrlie Agree to your idea. A pedantic solution would dig deeper into `ferror()` and `feof()` for when `fgets()` returns `NULL`. With `fgets()`, the buffer is left alone on `feof()`, yet indeterminate on `ferror()`. I suspect `gets()` worked the same. So `*buf = '\0'` may make sense on `ferror()` only. – chux - Reinstate Monica Dec 01 '15 at 23:16
  • @chqrlie In the 2nd case I was using the invisible swap(lines-of-code) routine. Fixed now. – chux - Reinstate Monica Dec 01 '15 at 23:18