2

GNU manual

This quote is from the GNU manual

Warning: If the input data has a null character, you can’t tell. So don’t use fgets unless you know the data cannot contain a null. Don’t use it to read files edited by the user because, if the user inserts a null character, you should either handle it properly or print a clear error message. We recommend using getline instead of fgets.

As I usually do, I spent time searching before asking a question, and I did find a similar question on Stack Overflow from five years ago: Why is the fgets function deprecated?

Although GNU recommends getline over fgets, I noticed that getline in stdio.h takes any size line. It calls realloc as needed. If I try to set the size to 10 char:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    char *buffer;
    size_t bufsize = 10;
    size_t characters;

    buffer = (char *)malloc(bufsize * sizeof(char));
    if( buffer == NULL)
    {
        perror("Unable to allocate buffer");
        exit(1);
    }

    printf("Type something: ");
    characters = getline(&buffer,&bufsize,stdin);
    printf("%zu characters were read.\n",characters);
    printf("You typed: '%s'\n",buffer);
    return(0);
}

In the code above, type any size string, over 10 char, and getline will read it and give you the right output.

There is no need to even malloc, as I did in the code above — getline does it for you. I'm setting the buffer to size 0, and getline will malloc and realloc for me as needed.

#include <stdio.h>
#include <stdlib.h>

int main()
{
    char *buffer;
    size_t bufsize = 0;
    size_t characters;

    printf("Type something: ");
    characters = getline(&buffer,&bufsize,stdin);
    printf("%zu characters were read.\n",characters);
    printf("You typed: '%s'\n",buffer);
    return(0);
}

If you run this code, again you can enter any size string, and it works. Even though I set the buffer size to 0.

I've been looking at safe coding practices from CERT guidelines www.securecoding.cert.org

I was thinking of switching from fgets to getline, but the issue I am having, is I cannot figure out how to limit the input in getline. I think a malicious attacker can use a loop to send an unlimited amount of data, and use up all the ram available in the heap?

Is there a way of limiting the input size that getline uses or does getline have some limit within the function?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Manny_Mar
  • 398
  • 2
  • 13
  • An attacker can only send an unlimited amount of data by using an unlimited amount of network bandwidth, so it's not much of a threat. Anyway, getline will return failure if it cannot realloc a big enough buffer; you can limit the heap size if you're really worried about it. – rici Jan 21 '19 at 03:49
  • I agree, but they can use up all my available memory in the heap. I don't have that much ram. – Manny_Mar Jan 21 '19 at 03:55
  • There is no limit other than that which is encountered through exhausting resources (memory). It is a problem; there isn't a solution other than 'code it yourself' (that I know of). You can (and should) free the memory. You can learn how much memory is allocated from the `bufsize` variable you pass. Note that even if you read on `/dev/null`, the chances are `getline()` will allocate memory for you if you pass in a zero buffer size. One issue is how any modified variant should report 'reached limit before encountering newline (delimiter for `getdelim()`) or EOF'. Another value like `-2`? – Jonathan Leffler Jan 21 '19 at 06:15
  • @jonathan: [setrlimit](http://man7.org/linux/man-pages/man2/setrlimit.2.html) can be used to limit a process's (virtual) memory usage. I honestly don't think it's a problem in this use case, but it's certainly possible to restrict vmem growth. Getline returns a negative value and sets `errno` to `ENOMEM` if `realloc` fails. – rici Jan 21 '19 at 06:33
  • rici I did not know about `setrlimit`, but i think fgets is good enough, I don't think users will input '\n' into there input, and if they do, the string will be truncated. When i was learning C, I used K&R `getline` all the time, never had a problem with it. It does read NULL and it returns the string length, which you can use to ignore any NULL if they exist in your text. – Manny_Mar Jan 21 '19 at 06:45
  • Using `setrlimit()` is a blunderbuss — but I guess it can do the job. I'd like a function like `fgets()` to return the count of the characters read, so you can detect embedded null bytes. The K&R book has a `getline()` — actually, 4 variant implementations all with a wholly different interface from the POSIX `getline()` — that returns the number of bytes read. That is a useful design; it incorporates the best of both `fgets()` and `getline()`. – Jonathan Leffler Jan 21 '19 at 06:46
  • @Manny_Mar: 'there input' --> 'their input'; and you probably meant `'\0'` rather than `'\n'` since users will type newlines. You can edit your own comments if you're quick. – Jonathan Leffler Jan 21 '19 at 06:47
  • @Jonathan Leffle Your right, Just looked at K&R getline code, and it is a `'\n'` not a `'\0'`. I should have looked over the code before posting it. I cannot edit the above answer, must have a limit on how many times it can be edited. – Manny_Mar Jan 21 '19 at 06:54
  • @Manny_Mar — You can edit a comment for up to five minutes; after that, it is frozen. Hence the "if you're quick" part of my comment. There's also the 'copy, delete, and add new comment' technique — or 'copy, add new comment, delete old comment'. It works. – Jonathan Leffler Jan 21 '19 at 06:57
  • I wanted to look at K&R code, before commenting, I thought it was `'\0'` but when i looked it as `'\n'` took me a little time to look. – Manny_Mar Jan 21 '19 at 07:00
  • @jonathan: the point of `setrlimit` is precisely to avoid excess vmem usage; if you are concerned about that, then it's likely that `getline` will not be the only possible vmem hog and setting the limit globally will be appropriate. Otherwise, I honestly don't see the point of artificially limiting the length of an input line. And I say that as someone who has not infrequently had to find awkward workarounds for programs which impose such limits. I agree that the fgets prototype sucks. – rici Jan 21 '19 at 07:40
  • `fgets` is ok, don't think will get to many `NULL` when reading input from users. Although K&R first version of getline will read `NULL`, and has a fix size you set manually. It only stops if hits `EOF` or newline. It used this test `for (i=0; i< lim -1 && (c=getchar()) != EOF && c != '\n'; ++i)` It adds the `'\n'` at the end of the string and returns the size of the string, including the `'\n'`. – Manny_Mar Jan 21 '19 at 07:46
  • Normally, when you write console utilities, you're writing them for other people to run on *their* machines, so it's not your possibly limited resources which are being overcommitted. I don't see a use case for either getline or fgets in a server, because it's pretty rare to use a FILE* for socket communication, and there are excellent reasons to avoid any potentially blocking call on a socket regardless of resourcing issues. MISRA suggests that FILE* shouldn't normally be used in embedded systems. So where is the issue? – rici Jan 21 '19 at 07:48

2 Answers2

4

Using fgets is not necessarily problematic, all the gnu manual tells you is that if there's a '\0'-Byte in the file, so will there be in your buffer. You won't be able to tell if the null-delimiter in your buffer is the actual end of the file or just a null within the file. This means you can read a 100 char file into a 200 char buffer and it will contain a 50 char c-string.

The stdio.h readline in fact doesn't appear to have any sane length limitation so fread might be viable alternative.

Unlinke C getline and C++ std::getline(), C++ std::istream::getline() is limited to count characters

Gamification
  • 787
  • 5
  • 20
1

The GNU manual is just bad. Limiting the input length is usually the right thing to do, especially if input is untrusted, and fgets does this correctly. getline cannot be used safely in such a context.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • The GNU manual is pointing out a genuine problem with `fgets()`; it is not 'fessing up to a different but also potentially severe problem in their preferred replacement, `getline()` — which is unusual because it is documented to return `-1` (rather than `EOF` — though the two are usually the same) when it encounters EOF (with no data read). If `fgets()` (or a variant) returned the number of characters read instead of returning a pointer to the start of the string — something the calling code already knows, dammit — then `fgets()` would have distinct merits. – Jonathan Leffler Jan 21 '19 at 06:10