2

I'm trying to create a function to read a single line from a file of text using fgets() and store it in a dynamically allocating char* using malloc()but I am unsure as to how to use realloc() since I do not know the length of this single line of text and do not want to just guess a magic number for the maximum size that this line could possibly be.

#include "stdio.h"
#include "stdlib.h"
#define INIT_SIZE 50

void get_line (char* filename)

    char* text;
    FILE* file = fopen(filename,"r");

    text = malloc(sizeof(char) * INIT_SIZE);

    fgets(text, INIT_SIZE, file);

    //How do I realloc memory here if the text array is full but fgets
    //has not reach an EOF or \n yet.

    printf(The text was %s\n", text);

    free(text);

int main(int argc, char *argv[]) {
    get_line(argv[1]);
}

I am planning on doing other things with the line of text but for sake of keeping this simple, I have just printed it and then freed the memory.

Also: The main function is initiated by using the filename as the first command line argument.

mschuurmans
  • 1,088
  • 1
  • 12
  • 24
Sphero
  • 303
  • 1
  • 3
  • 8

3 Answers3

4

The getline function is what you looking for.

Use it like this:

char *line = NULL;
size_t n;
getline(&line, &n, stdin);

If you really want to implement this function yourself, you can write something like this:

#include <stdlib.h>
#include <stdio.h>

char *get_line()
{
    int c;
    /* what is the buffer current size? */
    size_t size = 5;
    /* How much is the buffer filled? */
    size_t read_size = 0;
    /* firs allocation, its result should be tested... */
    char *line = malloc(size);
    if (!line) 
    {
        perror("malloc");
        return line;
    }

    line[0] = '\0';

    c = fgetc(stdin);
    while (c != EOF && c!= '\n')
    {            
        line[read_size] = c;            
        ++read_size;
        if (read_size == size)
        {
            size += 5;
            char *test = realloc(line, size);
            if (!test)
            {
                perror("realloc");
                return line;
            }
            line = test;
        }
        c = fgetc(stdin);
    }
    line[read_size] = '\0';
    return line;
}
Mathieu
  • 8,840
  • 7
  • 32
  • 45
  • 2
    Note that `getline` is a POSIX-only function, it's not portable. – Some programmer dude Oct 25 '18 at 09:01
  • 2
    @Someprogrammerdude "POSIX" and "not portable" are somewhat of a contradiction in terms. That's only true if by "not portable" you effectively mean "available almost everywhere, but not available on Windows". – Andrew Henle Oct 25 '18 at 09:29
  • 1
    `line = realloc(line, size);` never `realloc` the pointer itself, instead use a temporary pointer, e.g. `void *tmp = realloc (line, size);` – David C. Rankin Oct 25 '18 at 09:39
  • @DavidC.Rankin I agree, but note the `exit()` call if the `realloc` failed – Mathieu Oct 25 '18 at 09:40
  • Yep, that's the only way to handle a failure if it occurs (or you are leaking...). Good approach. Using the temporary route allows preserving at least what is stored to the point of failure. – David C. Rankin Oct 25 '18 at 09:41
  • 1) `if (NULL == line)` is coded selectively, after `realloc()`, but not after `malloc()`. 2) `getline()` includes `'\n'` in the allocated buffer, `get_line()` does not. 3) `get_line()` does not distinguish between an end-of-file, input error or a line of just `"\n"`. – chux - Reinstate Monica Oct 25 '18 at 15:22
1

One possible solution is to use two buffers: One temporary that you use when calling fgets; And one that you reallocate, and append the temporary buffer to.

Perhaps something like this:

char temp[INIT_SIZE];  // Temporary string for fgets call
char *text = NULL;     // The actual and full string
size_t length = 0;     // Current length of the full string, needed for reallocation

while (fgets(temp, sizeof temp, file) != NULL)
{
    // Reallocate
    char *t = realloc(text, length + strlen(temp) + 1);  // +1 for terminator
    if (t == NULL)
    {
        // TODO: Handle error
        break;
    }

    if (text == NULL)
    {
        // First allocation, make sure string is properly terminated for concatenation
        t[0] = '\0';
    }

    text = t;

    // Append the newly read string
    strcat(text, temp);

    // Get current length of the string
    length = strlen(text);

    // If the last character just read is a newline, we have the whole line
    if (length > 0 && text[length - 1] == '\n')
    {
        break;
    }
}

[Discalimer: The code above is untested and may contain bugs]

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
1

With the declaration of void get_line (char* filename), you can never make use of the line you read and store outside of the get_line function because you do not return a pointer to line and do not pass the address of any pointer than could serve to make any allocation and read visible back in the calling function.

A good model (showing return type and useful parameters) for any function to read an unknown number of characters into a single buffer is always POSIX getline. You can implement your own using either fgetc of fgets and a fixed buffer. Efficiency favors the use of fgets only to the extent it would minimize the number of realloc calls needed. (both functions will share the same low-level input buffer size, e.g. see gcc source IO_BUFSIZ constant -- which if I recall is now LIO_BUFSIZE after a recent name change, but basically boils down to an 8192 byte IO buffer on Linux and 512 bytes on windows)

So long as you dynamically allocate the original buffer (either using malloc, calloc or realloc), you can read continually with a fixed buffer using fgets adding the characters read into the fixed buffer to your allocated line and checking whether the final character is '\n' or EOF to determine when you are done. Simply read a fixed buffer worth of chars with fgets each iteration and realloc your line as you go, appending the new characters to the end.

When reallocating, always realloc using a temporary pointer. That way, if you run out of memory and realloc returns NULL (or fails for any other reason), you won't overwrite the pointer to your currently allocated block with NULL creating a memory leak.

A flexible implementation that sizes the fixed buffer as a VLA using either the defined SZINIT for the buffer size (if the user passes 0) or the size provided by the user to allocate initial storage for line (passed as a pointer to pointer to char) and then reallocating as required, returning the number of characters read on success or -1 on failure (the same as POSIX getline does) could be done like:

/** fgetline, a getline replacement with fgets, using fixed buffer.
 *  fgetline reads from 'fp' up to including a newline (or EOF)
 *  allocating for 'line' as required, initially allocating 'n' bytes.
 *  on success, the number of characters in 'line' is returned, -1
 *  otherwise
 */
ssize_t fgetline (char **line, size_t *n, FILE *fp)
{
    if (!line || !n || !fp) return -1;

#ifdef SZINIT
    size_t szinit = SZINIT > 0 ? SZINIT : 120;
#else
    size_t szinit = 120;
#endif

    size_t idx = 0,                 /* index for *line */
        maxc = *n ? *n : szinit,    /* fixed buffer size */
        eol = 0,                    /* end-of-line flag */
        nc = 0;                     /* number of characers read */
    char buf[maxc];     /* VLA to use a fixed buffer (or allocate ) */

    clearerr (fp);                  /* prepare fp for reading */
    while (fgets (buf, maxc, fp)) { /* continuall read maxc chunks */
        nc = strlen (buf);          /* number of characters read */
        if (idx && *buf == '\n')    /* if index & '\n' 1st char */
            break;
        if (nc && (buf[nc - 1] == '\n')) {  /* test '\n' in buf */
            buf[--nc] = 0;          /* trim and set eol flag */
            eol = 1;
        }
        /* always realloc with a temporary pointer */
        void *tmp = realloc (*line, idx + nc + 1);
        if (!tmp)       /* on failure previous data remains in *line */
            return idx ? (ssize_t)idx : -1;
        *line = tmp;    /* assign realloced block to *line */
        memcpy (*line + idx, buf, nc + 1);  /* append buf to line */
        idx += nc;                  /* update index */
        if (eol)                    /* if '\n' (eol flag set) done */
            break;
    }
    /* if eol alone, or stream error, return -1, else length of buf */
    return (feof (fp) && !nc) || ferror (fp) ? -1 : (ssize_t)idx;
}

(note: since nc already holds the current number of characters in buf, memcpy can be used to append the contents of buf to *line without scanning for the terminating nul-character again) Look it over and let me know if you have further questions.

Essentially you can use it as a drop-in replacement for POSIX getline (though it will not be quite as efficient -- but isn't not bad either)

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • Corner: When a rare input error occurs, says on the 2nd loop of `while (fgets ...`, `fgetline()` returns a non -1 when -1 is expected. Perhaps a `clearerr()` before the loop and `feof (fp) && !nc` --> `(feof(fp) && !nc) || ferror(fp)`. – chux - Reinstate Monica Oct 25 '18 at 14:51
  • Hmm... Good call on `clearerr()` to ensure the stream is in a state where a read can be attempted. But I'm still in a quandary over the corner-case on say the 2nd loop input error. That would be we entered a stream error state that was not `EOF` such that `feof (fp)` tested false and no characters were read on the 1st trip through the loop, or characters were read and the and a non-EOF stream error occurred. I'm still trying to put my finger on that one. I tested emptyfile, emptystring, emptystringnewline, etc... but still didn't find that corner. Regardless both are good additions. – David C. Rankin Oct 25 '18 at 18:35
  • The rare return due to input error could occur on any read. The Std C lib functions return `NULL/0/EOF` when an input error occurred even if some prior input was successful. Some of those functions (all?) also return likewise if the error flag is set prior. To follow that pattern, if the first `fgets()` successfully reads a buffer without a `'\n'`, but the 2nd `fgets()` results in an input error, the expected return would be -1. All these _input error_ handling is niche code and may be beyond OP at this time, yet glad to hear of your interest. – chux - Reinstate Monica Oct 25 '18 at 18:54
  • 1
    Oh yes. There is no better time spent that tormenting oneself with the nuances of user-input. Boil it down far enough and you have to start looking at the various libc sources just to figure out how, and if, it is handled there. – David C. Rankin Oct 25 '18 at 20:07