4

Let's get this out of the way first: I am fully aware of why the gets function is deprecated and how it can be used maliciously in uncontrolled environments. I am positive the function can be used safely in my setup.

"Alternative" functions won't do for my use case. The software I am writing is very performance-sensitive, so using fgets and then using one full string loop (either explicitely or by a helper function like strlen) to get rid of the newline at the end of the string simply won't do. Writing my own gets function using getchar or something similar will also probably be much less efficient than a native implementation (not to mention very error prone).

The Visual Studio compiler has a gets_s function which fits my needs, however this function is nowhere to be found in gcc. Is there any compiler option/flag to get this function back and/or some alternative function that it implements?

Thank you in advance.

Nu Nunu
  • 376
  • 3
  • 14
  • I'm not sure why you would need a loop to remove a trailing newline. – mkrieger1 May 23 '21 at 21:27
  • You need a loop to determine the string's length, since `fgets` doesn't give this information to you (using `strlen` is still using a loop underneath). – Nu Nunu May 23 '21 at 21:29
  • Oh okay, thought it would return the number of characters read. – mkrieger1 May 23 '21 at 21:30
  • 1
    It would probably be helpful if you showed some context of what you are trying to achieve. Possibly there is a way of doing this properly and efficiently without using `gets`. – mkrieger1 May 23 '21 at 21:32
  • @Fredrik you can test it out yourself. Make a file with about 100 million line-separated words. Then make a program that reads them using `gets`, and another one using `fgets` and removing the newline. On my machine (with a Ryzen 9 processor, tested with VS compiled and its implementation of `gets_s`), the former runs in 1.02 secs and the latter in 1.27 secs (averaged over 100 runs). That's about a 25% hit to performance on the input, not at all negligible. – Nu Nunu May 23 '21 at 22:01
  • `-std=c89` maybe? – Marc Glisse May 23 '21 at 22:06
  • This depends on the library and linker as well as the compiler, so you should say more about the system you are using. On Linux with glibc and GNU ld, you can just insert your own declaration `char *gets(char *);` and compile as normal; you will get a linker warning but it's not fatal. – Nate Eldredge May 23 '21 at 22:07
  • 1
    Most of the overhead of a `getchar()` loop is likely to be the locking and unlocking of the internal mutex on each call. If you aren't using threads, or do your own locking, you can consider `getchar_unlocked()` which some libraries provide; it is typically a macro or inline function that reads directly out of the FILE buffer, and can be unrolled / inlined / etc. – Nate Eldredge May 23 '21 at 22:21
  • 1
    But a better direction to look for optimizations is to eliminate the many times that stdio functions copy data: once from the kernel into user space, again from the stdio buffer into your buffer. You can eliminate one of those by using `read()` directly and scanning for newlines on your own, and eliminate the other by switching to `mmap` or the like. – Nate Eldredge May 23 '21 at 22:21
  • 1
    If you really want fast input, use `mmap`. See my answer: https://stackoverflow.com/questions/33616284/read-line-by-line-in-the-most-efficient-way-platform-specific/33620968#33620968 It has benchmarks. And, be sure to see the pastebin link in the comments – Craig Estey May 23 '21 at 22:23
  • You still haven't given us any kind of clue for the time-critical context your program is supposed to run in, but using a long deprecated function is rarely the best solution. If neither `fgets()` nor rolling a custom portable version cut it for you , just use the low-level API of whichever platform you are targeting (if it's more than one, then roll different versions for each one of them). – Harry K. May 23 '21 at 23:02
  • Shot in the dark, but: sometimes it makes a huge difference whether the locale is set to one that does or doesn't have to do extra UTF-8 validity checking. – Steve Summit Jul 08 '21 at 16:07

3 Answers3

3

Implementing your own safe gets() function using getchar_unlocked() is easy and reasonably efficient.

If your application is so performance sensitive, that you think fgets() and removing the scan is going to be the bottleneck, you should probably not use the stream functions at all and use lower level read() system calls or memory mapped files.

In any case, you should carefully benchmark your application and use profiling tools to determine where the time is spent.

Here is a simple implementation that returns the line length but truncates the line to whatever fits in the destination array buf of length n and returns EOF at end of file:

int my_gets(char *buf, size_t n) {
    int c;
    size_t i = 0;
    while ((c = getchar_unlocked()) != EOF && c != '\n') {
        if (i < n) {
            buf[i] = c;
        }
        i++;
    }
    if (i < n) {
        buf[i] = '\0';
    } else
    if (n > 0) {
        buf[n - 1] = '\0';
    }
    if (c == EOF && i == 0) {
        return EOF;
    } else {
        return (int)i;
    }
}

If your goal is to parse a log file line by line and only this function to read from stdin, you can implement a custom buffering scheme with read or fread in a custom version of gets(). This would be portable and fast but not thread safe nor elegant.

Here is an example that is 20% faster than fgets() on my system:

/* read a line from stdin
   strip extra characters and the newline
   return the number of characters before the newline, possibly >= n
   return EOF at end of file
 */
static char gets_buffer[65536];
static size_t gets_pos, gets_end;

int my_fast_gets(char *buf, size_t n) {
    size_t pos = 0;
    for (;;) {
        char *p = gets_buffer + gets_pos;
        size_t len = gets_end - gets_pos;
        char *q = memchr(p, '\n', len);
        if (q != NULL) {
            len = q - p;
        }
        if (pos + len < n) {
            memcpy(buf + pos, p, len);
            buf[pos + len] = '\0';
        } else
        if (pos < n) {
            memcpy(buf + pos, p, n - pos - 1);
            buf[n - 1] = '\0';
        }
        pos += len;
        gets_pos += len;
        if (q != NULL) {
            gets_pos += 1;
            return (int)pos;
        }
        gets_pos = 0;
        gets_end = fread(gets_buffer, 1, sizeof gets_buffer, stdin);
        if (gets_end == 0) {
            return pos == 0 ? EOF : (int)pos;
        }
    }
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • 1
    It's very difficult to reach the speed of native functions with custom-written ones. Try this: make a file with 100 million line-separated strings. Then run both a program reading these strings with `fgets` and removing the terminating newline, then the same program but reading with `my_gets`. On my configuration, the former finishes in 1.62 seconds, while the latter finishes in 7.37 seconds (averaged over 10 runs), roughly 5 times slower than the `fgets` version (which, as prevsiouly mentioned in a different reply is already about 25% slower than `gets`). Not exactly "reasonably efficient". – Nu Nunu May 23 '21 at 22:24
  • @NuNunu: can you document which compiler / libc / OS ? stream functions used to be much faster without multi-thread support. Is your application multi-threaded? – chqrlie May 23 '21 at 22:25
  • @NuNunu: `gets()` and `fgets()` can scan the stream buffer for the newline, that's how they get better performance. Depending on your target system, you could implement your custom function the same way. – chqrlie May 23 '21 at 22:32
  • @NuNunu: Also can you do without removing the trailing newline? Can you post the relevant code? – chqrlie May 23 '21 at 22:33
  • 1
    @NuNunu: Did you enable optimizations? For me, using `gcc -O3`, I found `my_gets` is only about 10-20% slower than `gets`. – Nate Eldredge May 23 '21 at 22:47
  • @NuNunu: on my Mac, `gets()` is actually 25% slower than `fgets()` and the overhead of stripping the newline is negligible. `my_gets()` as implemented above is half as fast. Of course these timings depend on average line length. – chqrlie May 23 '21 at 23:01
  • @NuNunu: I posted a faster alternative you can use if all lines are read via this function. – chqrlie May 23 '21 at 23:27
2

I suppose you are running on Windows, so it's possible that none of this information is relevant. The tests below were done on a Linux Ubuntu laptop, not particularly leading edge. But, for what it's worth:

  1. If gets is in your standard library (it's in my standard library, fwiw), then you only need to declare it to use it. It doesn't matter that it has been removed from your standard library headers:

    char* gets(char* buf);
    

    You could declare gets_s yourself, too, if that's the one you want to use.

    This is entirely legal according to the C standard: "Provided that a library function can be declared without reference to any type defined in a header, it is also permissible to declare the function and use it without including its associated header." (§7.1.4 ¶2)

    See @dbush's answer for the linker option to avoid the deprecation message.

  2. That's not actually what I would do. I'd use the Posix-standard getline function. On a standard Linux install, you need a feature-test macro to see the declaration: #define _POSIX_C_SOURCE 200809L or #define _XOPEN_SOURCE 700. (You can use larger numbers if you want to.) getline avoids many of the issues with fgets (and, of course, gets) because it returns the length of the line read rather than copying its buffer argument to the return value. If your input handler needs (or can use) this information, it might save a few cycles to have it available. It certainly can be used to check for and remove the newline.

    On my little laptop, using a file of 100,000,000 words as you suggest, I got the following timings for the three alternatives I tested:

    gets (dangerous)     fgets (+strlen)      getline
    ----------------     ---------------      -------
         1.9958               2.3585           2.0350
    

    So it does show some overhead with respect to gets, but it's not (IMHO) very significant, and it's quite possible that the fact that you don't need strlen in your handler will recoup the small additional overhead.

Here are the loops I tested. (I call an external function called handle to avoid optimisation issues; in my test, all handle does is increment a global line count.)

gets (dangerous)

  char buf[80]; // Disaster in waiting
  for (;;) {
    char* in = gets(buf);
    if (in == NULL) break;
    handle(in);
  }

fgets (+strlen)

  char buf[80]; // Safe. But long lines are not handled correctly.
  for (;;) {
    char* in = fgets(buf, sizeof buf, stdin);
    if (in == NULL) break;
    size_t inlen = strlen(in);
    if (inlen && in[inlen - 1] == '\n')
      in[inlen - 1] = 0;
    handle(in);
  }

getline

  size_t buflen = 80;        // If I guessed wrong, the only cost is an extra malloc.
  char* in = malloc(buflen); // TODO: check for NULL return.
  for (;;) {
    ssize_t inlen = getline(&in, &buflen, stdin);
    if (inlen == -1) break;
    if (inlen && in[inlen - 1] == '\n')
      in[inlen - 1] = 0;
    handle(in);
  }
  free(in);
rici
  • 234,347
  • 28
  • 237
  • 341
  • Try `char *in=NULL; buflen=0;` before passing them to `getline()` and see if it improves its performance. However, I'm kinda stunned that in a loop for 100.000.000 words it performed better than `fgets()` (did you `free(in)` inside each iteration when reading all those words, as you should?) – Harry K. May 24 '21 at 00:35
  • 1
    @HarryK. "As you should" according to whom? (No, I didn't. I use the same buffer every time, unless a long line is encountered in which case getline will realloc for me. This is an expected use case.) – rici May 24 '21 at 00:48
  • You are right @rici, my bad! Just checked the `getline()` docs, I was remembering wrong! So yeah, no `free()` needed in each iteration, just once after the loop. My apologies. Btw, still stunned it outperforms `fgets()`! Good to know! – Harry K. May 24 '21 at 00:52
  • @HarryK. It doesn't really outperform fgets. But it avoids the need to call strlen, which, as OP notes is needed to remove the newline terminator. And it also doesn't require you to make some arbitrary guess about how long a line can be. You can still guess, as I did here. If you guess right, it's as fast as gets. If you guess wrong, it degrades gracefully. All in all it's an example of a good API, unlike most stdio interfaces, and an illustration of why good library interfaces take several attempts to get right. So it's worth learning from (as well as using, of course). – rici May 24 '21 at 01:01
  • Yes, I used to use it too, but it's been a while. Btw I just had a glance at the source-code from the 1st google result (https://dev.w3.org/libwww/Library/src/vms/getline.c). No idea if it's the right one or the most up to date but it seems like starting with a static buffer of 256 bytes. **EDIT**: Nah, doesn't seem right.... it's probably some ancient variant. – Harry K. May 24 '21 at 01:09
  • `getline` isn't a drop in replacement for `fgets` because it does `realloc` on the buffer. I have a lot of `fgets` in my [personal] code, followed by one of `strlen/strchr/strcspn` for newline strip. But, most of my buffers are global/stack based. Is my usage idiomatic/common enough to merit an (e.g.) `ssize_t fgetsn(char *buf,size_t buflen,FILE *fi)` being added to POSIX/libc [so that newline is stripped in the manner of your `getline` usage]? – Craig Estey May 24 '21 at 01:11
  • @HarryK.: current: https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iogetline.c;h=debc8ef78b7bea7e3b891d848a2a009d97972956;hb=HEAD starts with the buffer you give it – rici May 24 '21 at 01:18
  • @craig: what do you do when the line is too long? In my example code, I just ignore the problem, which will lead to mystery bugs. What I like about getline is that it doesn't force me to figure out what to do in that case. I'm happy to accept the cost of a few mallocs (and there is no need to call malloc yourself, as harryk points out above). I don't know how common newline stripping is. It's certainly not common in my code. Frankly, I'd be happy if get{line,delim} were just added to stdio.h. The rest is easy. – rici May 24 '21 at 01:26
  • @rici, Thanks! I bookmarked it for later reference (4.30am here lol). The reason I suggested trying it with in=NULL; buflen=0; is because by doing so it allocates the buffer by itself (and probably makes a better guess?). That much I was remembering (lol) and the docs seem to confirm that. But I don't have a clear head to go over the code right now. – Harry K. May 24 '21 at 01:27
  • @rici: kudos to you for such a simple and efficient solution! You can further improve with this simplification: `if (inlen <= 0) break; if (in[inlen - 1] == '\n') in[inlen - 1] = 0;` Indeed `getline()` should not return `0` even if `*in == '\0'` because it reads at least 1 byte or returns `-1` at end of file or on error. – chqrlie May 24 '21 at 06:23
  • @chqrlie: I guess that's so. But are you 100% sure that `getline` doesn't return a negative number if the length of the line read exceeds `SSIZE_MAX`? I have this lingering doubt. (Obviously, only an interesting case on systems where `ssize_t` has around 32 bits, my code as written only works with negative `inlen` if it is the same size as a pointer.) – rici May 29 '21 at 23:04
  • @rici: I agree that it the API is unfortunate: `getline()` should return a `size_t` with a zero value for end of file, but this API cannot be changed and it is possible indeed that only the value `-1` has a special meaning and other negative values are to be taken for their unsigned value. There is no need to test for `<= 0`: `getline()` should not return `0` anyway, so `if (inlen == -1) break; if (in[inlen - 1] == '\n') in[inlen - 1] = 0;` should be fine. – chqrlie May 29 '21 at 23:40
  • Alternately, both `-1` and `0` can be tested together with `if ((size_t)inlen + 1 <= 1) break; if (in[inlen - 1] == '\n') in[inlen - 1] = 0;` – chqrlie May 29 '21 at 23:44
1

You can use the -Wno-deprecated-declarations option to prevent warnings for all deprecated functions. If you want to to disable it in specific instances, you can use pragmas for this:

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
gets(str);
#pragma GCC diagnostic pop

Note that in both cases this only prevents the compiler from complaining. The linker will still give a warning.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • Sadly, this does not work. Since the function has been removed from the standard completely, I am getting an error (that the function `gets` is undeclared), not a mere warning. – Nu Nunu May 23 '21 at 21:50
  • @NuNunu Are you able to compile with `-std=c99` or `-std=gnu99`? – dbush May 23 '21 at 22:21