3

The following code sets a maximum line size read from stdin. I'd rather not hard-code a specific line length, and have the flexibility to handle any buffer length. What are good strategies to allow processing of any size?

If these strategies are much more complex, is there a way to at least guarantee that getline will not overflow? Thanks.

 #include<stdlib.h>
 #include<stdio.h>
 #include<string.h>

 #define P 20

 int main()
 {
   size_t size = 1920;
   char *line;
   // record row; /* structure to store fields */
   char tokens[P][41];
   int p;
   char delims[] = ",";     /* ", |" */
   char *result = NULL;

   line = ( char * ) malloc( size + 1 );

   while( getline(&line, &size, stdin) != -1 )
   {
      /* chomp */
      line[strlen(line)-1] = '\0';

      /* load char array */
      result = strtok( line , delims );
      p = 0;
      while( result != NULL && ( p < P ) ) 
      {
         strcpy( tokens[p++] , result );
         result = strtok( NULL, delims );
      }

      if (p != P)
      {
         fprintf(stderr,"Wrong number of input fields.\nFormat: ID,x1, ... ,x%d\n",P);
     exit(-1);
      }

      /* load record ( atol, atof etc... , skipped for brevity ) and work with record */

      return 0;
 }
qwe890
  • 53
  • 5
  • 1
    I typically solve it using buffers instead of lines and then using POSIX calls, as I often need nonblocking calls. If I don't know the size in advance, I can grow the buffer using `realloc()` either lineary or exponentially, depending on circumstances. It's also possible to shrink it when the large size is no longer needed, also using `realloc()`. – Pavel Šimerda Sep 14 '14 at 22:37

2 Answers2

7

You can have getline allocate memory for you (which is the whole point of using the non-standard getline function over the standard fgets function). From the getline manual page:

If *lineptr is NULL, then getline() will allocate a buffer for storing the line, which should be freed by the user program. (The value in *n is ignored.)

Alternatively, before calling getline(), *lineptr can contain a pointer to a malloc-allocated buffer *n bytes in size. If the buffer is not large enough to hold the line, getline() resizes it with realloc, updating *lineptr and *n as necessary.

So you can do:

line = NULL;
while (getline(&line, &size, stdin))
{
    // ... Do stuff with `line`...
}
free(line);

(Or leave your code as-is, since getline will resize your allocated buffer for you.)

Community
  • 1
  • 1
jamesdlin
  • 81,374
  • 13
  • 159
  • 204
  • And if it is not `NULL` it will `realloc()`ate it if needed. – 5gon12eder Sep 14 '14 at 22:41
  • 1
    +1. However please get noticed that `getline` is a GNU extension (later becomes part of the POSIX standard), not part of the C standard so far. – starrify Sep 14 '14 at 22:46
  • +1 It isn't difficult to implement `getline` if your system doesn't have it either, provided you know what you're doing. I'm sure there is an implementation online with a BSD-like license if you cannot use any GPL or LGPL code and you cannot or don't want to code it yourself. –  Sep 15 '14 at 02:17
  • If `getline` isn't available, I'd recommend using [Chuck Falconer's `ggets`](http://www.taenarum.com/software/ggets/). (The interface is more like `gets` than like `getline`, however.) – jamesdlin Jan 07 '15 at 04:06
2

Here's the code I've been using - Fgetstr(FILE*, const char*). It roughly doubles the buffer size for each realloc, and won't crash on a failed malloc/realloc. Called like: char *text = Fgetstr(stdin, "\n"); or whatever.

The library getdelim() function is similar, although mine might be much older. The manpage on getline and getdelim doesn't detail what happens if the malloc and realloc fail on my system, and only mention a possible error EINVAL (no ENOMEM). Hence, the behavior in the face of memory exhaustion may be undefined for getline/getdelim.

Also, as starrify points out, many systems don't have getline.

#include <sys/types.h>
#include <stdio.h>
#include <string.h>
#include <malloc.h>

#ifdef TEST
#define DEBUG
#endif

#ifdef DEBUG
#undef DEBUG
#define DEBUG(b) {b}
#else
#define DEBUG(b)  
#endif

#ifdef TEST
int main (int argc, char **argv)
{
    char *text = (char*)0;
    char *ends = "\n";

    if(argc > 1) ends = argv[1];

    while(text = Fgetstr(stdin, ends))
    {
        puts(text);
        free(text);
    }

    return 0;
}
#endif

/*  return specifications -
 *
 *  terminators include : ends, \0, and EOF
 *
 *  root    EOF?    text?   ended?  stat    returned value
 *          -       -       -       ... 
 *  1       -       -       1       return  ""
 *          -       1       -       ... 
 *  2       -       1       1       return  "text"
 *  3       1       -       -       return  -null-      EOF-*accepted*
 *  4       1       -       1       return  ""          EOF-postponed
 *  5       1       1       -       return  "text"      EOF-postponed/fake-end
 *  6       1       1       1       return  "text"      EOF-postponed/true-end
 *
 *  on ENOMEM, return -null-
 *
 */

static char *Fgetstr_R(FILE *ifp, const char *ends, unsigned int offset)
{
    char *s = (char*)0;                     /* the crucial string to return */
    unsigned int bufmax = offset;           /* as large as so far */
    unsigned int bufidx = 0;                /* index within buffer */
    char buffer[bufmax + 1];                /* on-stack allocation required */
    int ended = 0;                          /* end character seen ? */
    int eof = 0;                            /* e-o-f seen ? */

    DEBUG(fprintf(stderr, "(%d", offset););

    while(bufidx <= bufmax)     /* pre-recurse - attempt to fill buffer */
    {
        int c = getc(ifp);

        if( (ended = ( !c || (ends && strchr(ends,c)))) || (eof = (EOF==c)) )  
            break;

        buffer[bufidx++] = (char)c;
    }

    /* note - the buffer *must* at least have room for the terminal \0 */

    if(ended || (eof && offset))                    /* root 1,2,4,6 5 */
    {
        unsigned int offset_max = offset + bufidx;
        DEBUG(fprintf(stderr, " malloc %d", offset_max + 1););
        if(s = (char*)malloc((offset_max + 1) * sizeof(char)))
            s[offset_max] = '\0';
        else
            s = (char*)0, perror("Fgetstr_R - malloc");
    }
    else
    {
        if(eof && !offset)  /* && !ended */     /* root 3 */
            s = (char*)0;
        else
            s = Fgetstr_R(ifp, ends, offset + bufidx);  /* recurse */
    }

    /* post-recurse */

    if(s)
        strncpy(&s[offset], &buffer[0], bufidx);  /* cnv. idx to count */

    DEBUG(fprintf(stderr, ")", offset););
    return s;
}

char *Fgetstr (FILE *ifp, const char *ends)
{
    register char *s = (char*)0;
    DEBUG(fprintf(stderr, "Fgetstr "););
    s = Fgetstr_R(ifp, ends, 0);
    DEBUG(fprintf(stderr, ".\n"););
    return s;
}
Alex North-Keys
  • 4,200
  • 1
  • 20
  • 22
  • 1) Although the result is in a `malloc` buffer, the intermediate buffer is in a local VLA. Hmmm - I would think if code needed a buffer of "without max buffer length", having similar sized intermediates as local VLAs may be a problem. Suspect that also is why the length is `unsigned` rather than `size_t`. I'd +1 were it not for heavy VLAs usage. 2) Noticed it stops on `'\0'` unlike `fgets()`. Probably just as well. – chux - Reinstate Monica Sep 14 '14 at 23:21
  • 1
    Yep, the VLA thing shows a GCC bias. The "unsigned" usage I think is because the original code is pre-1990 and size_t wasn't quite everywhere yet. With the VLAs, the stack limit, which can easily be much smaller than heap (I've written code that used 1K stacks per thread), could end up being an issue, so it probably would make more sense to stick with realloc and its numerous extra copies to make sure only the heap size is involved as a limiter. It'd probably be nice to have a fast one (using stack) and a heap-focused version of this kind of routine. – Alex North-Keys Sep 15 '14 at 05:10
  • This code may fail on any processor where sizeof(size_t) or sizeof(ptr_t) != sizeof(int), which is a lot of embedded processors and many that are commonly being used in the IoT space. The VLAs are interesting, but they can gobble stack space like mad. Finally, the behavior of realloc() on malloc() failure has been defined at least since 7th Edition (I just looked at the 7th Edition source code), so a better solution would be iteratively expanding the buffer using realloc() and avoiding the recursion entirely. However, excessive realloc() fragments the heap, so ... – Julie in Austin Jan 27 '15 at 16:20
  • It's true that this code should use "unsigned long int", or rather "size_t" when computing offset_max, There another bug where an fprintf has an "offset" parm that it doesn't use. The choice to use stack instead of heap has obvious issues when stack size is restricted, but the point was to explore using stack *instead* of a large number of realloc calls, with their potential to repeatedly moved the data. Devices with small memory weren't part of the problem space at the time. – Alex North-Keys Jan 27 '15 at 20:48