0

I am using the SLM toolkit by CMU-Cambridge for some baseline language modeling on language data, but when I run one of the built executables, my system detects a buffer overflow when it tries to execute one of the commands.

Based on this StackOverflow question I noticed that __gets_chk+0x179 caused the problem, and I've found two occurrences of gets/fgets in the source code (evallm.c, also available in this GitHub project someone made) but I do not know how to fix them in a proper/secure way.

The relevant parts of the error message:

*** buffer overflow detected ***: /home/CMU-Cam_Toolkit_v2/bin/evallm terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(__gets_chk+0x179)[0x7f613bc719e9]
Aborted

The broken code

# declaration of input string variable
char input_string[500];

# occurence 1
...
while (fgets (wlist_entry, sizeof (wlist_entry),context_cues_fp)) { ... } 
...

# occurence 2
...
while (!feof(stdin) && !told_to_quit) {
    printf("evallm : ");
    gets(input_string);
....

The error basically occurred when the input_string I gave to the evallm command was too long. Normally it is to be called from the command line and you can interactively pass arguments. However, I piped all arguments together with the command (as seen in the example of the docs) but apparently sometimes my argument names where taking too much bytes. When I changed the array length of input_string from 500 to 2000 the problem was solved (so I guess the error was due to occurence 2). But I really would like to fix it by replacing gets() by getline() since it seems to be the right way to go. Or is replacing it by fgets() also a solution? If so, what parameters should I use?

However, when trying to replace gets(), I always get compiling errors. I'm not a C-programmer (Python, Java) and I'm not familiar with the syntax of getline(), so I'm struggling to find the right parameters.

Community
  • 1
  • 1
mbauwens
  • 13
  • 2
  • 6
  • 1
    You can also use fgets, the only difference for gets is that it has length. – Sean83 Nov 17 '16 at 14:04
  • 2
    Unrelated to your `gets`/`fgets` problem, but you might be interested in reading [Why is “while ( !feof (file) )” always wrong?](http://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong). – Some programmer dude Nov 17 '16 at 14:06
  • @Someprogrammerdude thanks for the heads up! The code has been around for quite a while (1999) so I can imagine some deprecated or 'olde timey' programming concepts are still present. I'll look into it! – mbauwens Nov 17 '16 at 14:08
  • There is a clear distinction between "old time programming concepts" (you mean "style, btw) and "problematic"/unsafe code. One can write very well safe code in assembly language. – too honest for this site Nov 17 '16 at 14:10
  • And see [ask] and provide a [mcve]. You missed vital information. – too honest for this site Nov 17 '16 at 14:11
  • @Olaf I'm not saying the C-language is bad, but that this code is probably written in a style which has more modern alternatives since it hasn't been updated since 1999. I haven't read the `while ( !feof (file))` article yet so I don't know whether it's an issue of erroneous code or unclear/bad style. – mbauwens Nov 17 '16 at 14:21
  • @Olaf, also; what's the issue with my code example? The fact that it's incomplete and not reproducible? Thanks for the feedback! – mbauwens Nov 17 '16 at 14:22
  • 1
    @mbauwens: Please follow the links I provided. – too honest for this site Nov 17 '16 at 14:36

2 Answers2

2

In your particular case, you know that input_string is an array of 500 bytes. (Of course, you could replace that 500 with e.g. 2048)

I am paranoid, adept of defensive programming, and I would zero that buffer before any input, e.g.

memset(input_string, 0, sizeof(input_string));

So the buffer is cleared, even when fgets has failed. In most cases that is in principle useless. But you have corner cases and the evil is in the details.

So read documentation of fgets(3) and replace the gets call with

fgets(input_string, sizeof(input_string), stdin);

(you actually should handle corner cases, e.g. failure of fgets and input line longer than input_string ....)

Of course, you may want to zero the terminating newline. For that, add

int input_len = strlen(input_string);
if (input_len>0) input_string[input_len-1] = '\0`;

(as commented, you might clear the input_string less often, e.g. at start and on fgets failure)

Notice that getline(3) is POSIX specific and is managing a heap-allocated buffer. Read about C dynamic memory allocation. If you are unfamiliar with C programming, that might be tricky to you. BTW, you could even consider using the Linux specific readline(3)

The main point is your familiarity with C programming.

NB: in C, # does not start a comment, but a preprocessor directive.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • 2
    The easier way is to just zero out the first entry **iff** `fgets` fails. `memset` for each call can be a significant performance issue. – too honest for this site Nov 17 '16 at 14:13
  • On the other hand, if it is human-typed input, you practically don't care. You would care if stdin was redirected (e.g. in a pipeline) – Basile Starynkevitch Nov 17 '16 at 14:14
  • Alwas be prepared. It does not look like it takes console input only. It would not cause much more effort, just checking the result, which you have to anyway. – too honest for this site Nov 17 '16 at 14:35
  • If code uses `fgets(input_string, sizeof(input_string), stdin); int input_len = strlen(input_string); ... ` instead of `gets(input_string);`, also suggest `char input_string[500];` --> `char input_string[500+1];` to consistently handle valid extreme test cases. – chux - Reinstate Monica Nov 17 '16 at 16:13
0

You replace gets with fgets.

It's almost that simple, the difference (besides the arguments) is that with fgets there might be a newline at the end of the buffer. (Note I say it might be there.)

I recommend this fgets reference.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • 1
    ...and you use the presence / absence of that newline to check if you've gotten the whole line, or ran into the size limit. – DevSolar Nov 17 '16 at 14:06
  • 1
    OP specifically mentioned `getline`, which in my opinion is safer than `fgets`. When recommending `fgets`, this answer implicitly recommends against `getline`. Is this intended? If yes, you must explain why. – anatolyg Nov 17 '16 at 14:09
  • When I looked up the buffer overflow issues with `gets`, the most common answer was to replace it with `getline`. Other than that I have no real preference for the one or the other. Just a safe alternative which keeps the script from aborting. – mbauwens Nov 17 '16 at 14:16
  • 1
    @anatolyg I recommend `fgets` because it's a standard C function. The `getline` function is a POSIX function and is not available anywhere. But otherwise I agree, `getline` might be a better choice if portability is not an issue. – Some programmer dude Nov 17 '16 at 14:18
  • 1
    @anatolyg `getline()` has a feature that at first looks good: auto allocating a buffer to the size of input. Yet with various nefarious users attempting to break code, this trades the buffer overrun problem of `gets()` with allowing the (evil) user to overwhelm memory resources with a _long_ line. Employing a generous sane upper bound on user input length is reasonable. Unfortunately I see no C stand-alone function yet as well handling that. – chux - Reinstate Monica Nov 17 '16 at 17:04