1

Problem

I am currently writing a small (and bad) grep-like program for Windows. In it I want to read files line by line and print out the ones which contain a key. For this to work I need a function which reads each line of a file. Since I am not on Linux I cannot use the getline function and have to implement it myself.

I have found an SO answer where such a function is implemented. I tried it out and it works fine for 'normal' text files. But the program crashes if I try to read a file with a line length of 13 000 characters.

MCVE

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

char * getline(FILE *f)
{
    size_t size = 0;
    size_t len  = 0;
    size_t last = 0;
    char *buf = NULL;

    do {
        size += BUFSIZ; /* BUFSIZ is defined as "the optimal read size for this platform" */
        buf = realloc(buf, size); /* realloc(NULL,n) is the same as malloc(n) */            
        /* Actually do the read. Note that fgets puts a terminal '\0' on the
           end of the string, so we make sure we overwrite this */
        if (buf == NULL) return NULL;
        fgets(buf + last, size, f);
        len = strlen(buf);
        last = len - 1;
    } while (!feof(f) && buf[last] != '\n');
    return buf;
}

int main(int argc, char *argv[])
{
    FILE *file = fopen(argv[1], "r");
    if (file == NULL)
        return 1;

    while (!feof(file))
    {
        char *line = getline(file);
        if (line != NULL)
        {
            printf("%s", line);
            free(line);
        }
    }
    return 0;
}

This is the file I am using. It contains three short lines which get read just fine and a long one from one of my Qt projects. When reading this line the getline function reallocates 2 times to a size of 1024 and crashes at the 3rd time. I've put printf around the realloc to make sure it crashes there and it definitely does.

Question

Could anyone explain me why my program is crashing like that? I just spend hours with this and don't know what to do anymore.

jsmolka
  • 780
  • 6
  • 15
  • 1
    [Why is “while ( !feof (file) )” always wrong?](https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) – Eugene Sh. Jul 13 '18 at 21:50
  • 7
    You *add* `size + BUFSIZ` and allocate that, but then you *read* that same – increased! – `size`. In essence, you are reading more and more characters than you allocated in each turn. If you read only `BUFSIZE` then this should work. – Jongware Jul 13 '18 at 22:00
  • 1
    You don't check if `fgets()` fails... – Andrew Henle Jul 13 '18 at 22:00
  • 1
    @usr2564301 Thank you! I could swear I've already tried this because it didn't make much sense to me either. Feel free to post an answer and I'll accept it. Cheers. – jsmolka Jul 13 '18 at 22:05
  • regarding: `buf = realloc(buf, size);` when calling `realloc()`, always save the returned value into a 'temp' variable, check that 'temp' variable and only if not NULL then copy to the target variable. Otherwise, when `realloc()` fails, the pointer to the allocated memory is lost, resulting in a memory leak is the result – user3629249 Jul 15 '18 at 17:15

1 Answers1

2

In this fragment

    size += BUFSIZ;
    buf = realloc(buf, size);
    if (buf == NULL) return NULL;
    fgets(buf + last, size, f);

you add size + BUFSIZ and allocate that, but then you read that same – increased! – size. In essence, you are reading more and more characters than you allocated in each turn. The first time around, size = BUFSIZ and you read exactly size/BUFSIZ characters. If the line is longer than this (the last character is not \n), you increase the size of the memory (size += BUFSIZ) but you also read its (new) total size again – and you've already processes that last number of size bytes.

The allocated memory grows with BUFSIZE per loop, but the amount of bytes to read increases with BUFSIZE – after one loop, it's BUFSIZE, after two loops 2*BUFSIZE, and so on, until something important gets overwritten and the program is terminated.

If you read only chunks of the exact size of BUFSIZE then this should work.

Note that your code expects the last line to end with an \n, which may not always be true. You can catch this with an additional test:

if (!fgets(buf + last, size, f))
    break;

so your code won't be trying to read past the end of the input file.

Jongware
  • 22,200
  • 8
  • 54
  • 100