0

I am struggling to get lines of text from a file and storing them into an array. I have used a debugger on my code and it seems to get the first couple lines of text but then there is a segmentation fault when it gets to the third line of text. I believe there is something wrong with my allocate_mem() function because the fault occurs when it is called during the 3rd iteration of the while loop in read_lines(). Help would be much appreciated!

My Code:

#include <stdio.h>
#include <stdlib.h>

char* read_one_line(FILE* fp, char* line);
char* allocate_mem(FILE* fp, char* line);

void print_lines(char** lines, int num_lines){
    int i;
    for(i = 0 ; i < num_lines; ++i){
        printf("%d. %s", i+1, lines[i]);
    }
}

void free_lines(char** lines, int num_lines){
    int i;

    for(i = 0 ; i < num_lines; ++i){
        free(lines[i]);
    }

    if(lines != NULL && num_lines > 0){
        free(lines);
    }
}

FILE* validate_input(int argc, char* argv[]){

    FILE* fp = NULL;

    if(argc < 2){
        printf("Not enough arguments entered.\nEnding program.\n");
        exit(0);
    }
    else if(argc > 2){
        printf("Too many arguments entered.\nEnding program.\n");
        exit(0);
    }

    fp = fopen(argv[1], "r");

    if(fp == NULL){
        printf("Unable to open file: %s\nEnding program.\n", argv[1]);
        exit(0);
    }
    return fp;
}

void read_lines(FILE* fp, char*** lines, int* num_lines) {
  *num_lines = 0;
  do {
    *num_lines += 1;
    *lines = realloc((*lines), (*num_lines) * sizeof(*lines));
    (*lines)[*num_lines - 1] = read_one_line(fp, (*lines)[*num_lines - 1]);
  } while((*lines)[*num_lines - 1] != NULL);
  free((*lines)[*num_lines - 1]);
  *num_lines -= 1;
}

char* read_one_line(FILE* fp, char* line) {
  line = NULL;
  int str_len = 0;

  while (1) {
        // resize buffer to hold next char, or zero termination
        char *tmp = realloc(line, str_len + 1);
        if (!tmp) {
          free(line);
          return NULL;
        }
        line = tmp;
        // Get next character from file
        int ch = fgetc(fp);
        if (ch == EOF) {
          free(line);
          return NULL;
        }
        else if (ch == '\n') {
          line[str_len] = '\n';
          line[str_len + 1] = 0;
          return line;
        }
        else {
             line[str_len] = ch;
             str_len++;
        }
    }
}

int main(int argc, char* argv[]){
    char** lines = NULL;
    int num_lines = 0;
    FILE* fp = validate_input(argc, argv);

    read_lines(fp, &lines, &num_lines);
    print_lines(lines, num_lines);
    free_lines(lines, num_lines);
    fclose(fp);

    return 0;
}
  • 1
    You make want to check the return value of `fgets` to verify that the buffer now contains a valid null-terminated string. – Andreas Wenzel Mar 12 '21 at 00:12
  • `while(read_one_line` That reads a line from the file and then throws it away. You should not have two calls to `read_one_line` per iteration. – kaylum Mar 12 '21 at 00:15
  • `allocate_mem()` allocates enough memory to contain the entire file. Why are you doing that for _every_ line in the file? – mhawke Mar 12 '21 at 00:31
  • 1
    The function `read_one_line` is extremely pointless. You have essentially just renamed `fgets`, changed to order of the arguments and the type of one argument from `int` to `long*`. Either add some real functionality to the function or completely remove it. It adds absolutely nothing. – klutt Mar 12 '21 at 00:55

1 Answers1

0

When allocating memory for a line of text, before reading the line, we have to do some magic with ftell, reading and discarding, fseek, memory allocation, and actual reading. This is not that hard to get right, but not all streams support seeking and we have to account for the file changing during operations. Failure do account for file changes may or may not be a security issue.

The easiest way is not to preallocate the correct size, but to but to start with a small string buffer, and resize the buffer as we read more characters.

/* 
   Read a line of text from `file` and store it in a freshly allocated 
   buffer

   Return new string or NULL on error or premature EOF
 */
char *alloc_and_read_line(FILE *file)
{
   char *str = NULL; 
   size_t str_len = 0

   while (1)
     {
        /* resize buffer to hold next char, or zero termination */
        char *tmp = realloc(str, str_len + 1);
        if (!tmp)
          {
             free(str);
             return NULL;
          }
        str = tmp;

        /* Get next character from file */
        int ch = fgetc(file);
        if (ch == EOF)
          {
             /* 
               IO error in `file`, or last line was not correctly
               terminated with newline
              */ 
             free(str);
             return NULL;
          }
        else if (ch == '\n')
          {
             /* End of line */
             str[str_len] = 0;
             return str;
          }
        else
          {
             str[str_len] = ch;
             str_len++;
          }
     }
}

Some people prefer to resize the buffer in larger steps than 1, or double the size when it needs to grow. The theory is that it should be faster, but it is not really an issue on modern realloc implementations (I have measured). It would only add more complexities and chances for bugs.

EDIT:

About your new segmentation error, *foo[bar] is the same as *(foo[bar]) not (*foo)[bar]. When it comes to operator precedence, the compiler will happily have a different opinion than what you wished for. The liberal use of parentheses doesn't hurt.

HAL9000
  • 2,138
  • 1
  • 9
  • 20
  • Do not agree with requiring the last line _must_ end with a `'\n'`, but that certainly is a way to handle things. C lib leaves that as _implementation-defined_. – chux - Reinstate Monica Mar 12 '21 at 01:22
  • @chux-ReinstateMonica, It is a common convention on UNIX that lines in text-files are terminated with `\n`, including the last one. You are right in that c-lib doesn't force that convention, but it is still a good idea to follow conventions. If you want to treat both `\n` and end-of-file as line-end, then on `ch==EOF` you have to check if `ferror(file)` is true to distinguish between IO-error and *true* end-of-file – HAL9000 Mar 12 '21 at 01:29
  • Another alternative would be on `ch==EOF` to return the string and let the caller use `feof(file)` and `ferror(file)` to decide what to do with a partial string. My point with the answer was to show the use of `realloc`, getting the details right is left to the reader. – HAL9000 Mar 12 '21 at 01:41
  • @HAL9000 Ok I used that code that you had just mentioned and it works good for the first two lines of my file but then there is a segmentation fault when it gets to the third line. I edited the original code in my question to show you what I have now. – Tyler Beckler Mar 12 '21 at 01:48
  • @HAL9000 Re: "Another alternative would be on ch==EOF to return the string" --> better to return `str_len ? *str = 0, str : EOF;` – chux - Reinstate Monica Mar 12 '21 at 03:07
  • @HAL9000 "you have to check if ferror(file) is true to distinguish between IO-error and true end-of-file " --> better to check `feof()` as it is [less ambiguous](https://stackoverflow.com/a/45264457/2410359). – chux - Reinstate Monica Mar 12 '21 at 03:13
  • @HAL9000 I am very close to the correct output but for some reason my code outputs a random character every once in a while. For example it outputs an exclamation point at the beginning of one of the lines of text. I think it may be a memory leak or something but I am not sure. I updated the code again if you could try and help. Thanks! – Tyler Beckler Mar 12 '21 at 04:30
  • @chux-ReinstateMonica, "better to check `feof()` as it is less ambiguos". You already know that `fgetc` returned `EOF`, so testing for`feof()` is reduntant. Testing for `ferror()` would add some additional info. I do agree that calling `ferror` before you know the return value of `feof` makes little sense. – HAL9000 Mar 12 '21 at 11:45
  • Testing for `feof()` is not redundant as `EOF` may be uncommonly due something other than _end-of-file_. `fgetc()` returns `EOF` when either 1) rare input error _just_ occurred 2) end-of-file just occurred 3) end-of-file occurred earlier (and a 4th more obscure reason). `feof()` return true on #2 or #3. `ferror()` returns true on #1 _and_ if an input error had occurred prior. Thus it is possible, though rare, for `fgetc()` to return `EOF` due to end-of-file and `ferror()` is true (`feof()` would also return true then). – chux - Reinstate Monica Mar 12 '21 at 11:56
  • @chux-ReinstateMonica, "better to return str_len ? *str = 0, str : EOF;" How do you propose returning `EOF` when return type is `char *`? How do you distinguish between empty line and read-erro/eof? – HAL9000 Mar 12 '21 at 11:56
  • @HAL9000 Yes, my comment, more correctly was along the lines of `str_len ? *str = 0, str : NULL;` in the `if (ch == EOF)` case. I'd expect an `"\n"` line to return an allocated buffer of `""`. – chux - Reinstate Monica Mar 12 '21 at 11:59
  • @chux-ReinstateMonica, "Testing for feof() is not redundant as EOF may...." `feof` may or may not be reduntant after fgetc returned `EOF`, I still don't see how its adds any useful info. Calling `ferror` after `EOF` at least adds the following info: `ferror()==true` -> There may be more data we cannot read and there is an error code in `errno`, `ferrro()==false` -> we can assume we have successfully read all the data there is in the stream. – HAL9000 Mar 12 '21 at 12:32
  • " Calling ferror after EOF at least adds the following info: ferror()==true -> There may be more data we cannot read and there is an error code in errno" --> The nature of _input error_ is broad. An input error does not mean the next `fgetc()` must nor must not return `EOF`. C lib does not specify `errno` having anything to do with `fgetc()`. – chux - Reinstate Monica Mar 12 '21 at 12:42
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/229820/discussion-between-hal9000-and-chux-reinstate-monica). – HAL9000 Mar 12 '21 at 12:59