2

I've had a look at some other similar questions and examples but I'm stumped. My goal is to open a very large text file (novel sized), allocate memory to an array, and then store the text into that array so I'm able to do further processing in the future.

This is my current code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define LINELEN 74

int main(void) {

FILE *file;
char filename[] = "large.txt";
int count = 0, i = 0, len;

/* Open the file */
  file = fopen(filename, "r");
  if (file == NULL) {
      printf("Cannot open file");
      return -1;
  }
    
/* Get size of file for memory allocation */
    fseek(file, 0, SEEK_END);
    long size = ftell(file);
    fseek(file, 0, SEEK_SET);
    
/* Allocate memory to the array */
  char *text_array = (char*)malloc(size*sizeof(char));
    
/* Store the information into the array */
    while(fgets(&text_array[count], LINELEN, file) != NULL) {
      count++;
      }

  len = sizeof(text_array) / sizeof(text_array[0]);

  while(i<len) {
    /* printf("%s", text_array); */
    i++;
  }
  printf("%s", text_array);

/* return array */
    return EXIT_SUCCESS;
}

I was expecting to have a large body of text printed from text_array at the bottom. Instead I get a garbled mess of random characters much smaller than the body of text I was hoping for. What am I doing wrong? I suspect it has something to do with my memory allocation but don't know what.

Any help is much appreciated.

  • You're trying to use `text_array` as if it's a 2-dimensional array. – Barmar Aug 11 '20 at 07:07
  • 1
    Why are you using `fgets()`? Just read the entire file with one call to `fread()`. – Barmar Aug 11 '20 at 07:08
  • If you are wanting to access each line in the stored file, your other option is to dynamically allocate a pointer for each line and then allocate memory to hold the line read by `fgets()` (don't forget to remove the `'\n'` and add `+1`for the *nul-terminating* character), copy the line to your allocated block and assign the starting address for the block to your pointer. When you need more pointers, `realloc` and keep going. E.g. `char **lines;` and then allocate pointers, `size_t p_avail = 2, p_used = 0; line = malloc (p_avail * sizeof *lines);` Realloc when `p_used == p_avail`. – David C. Rankin Aug 11 '20 at 07:18
  • That approach is shown here [How to take text file as command line argument in C](https://stackoverflow.com/a/57488262/3422102) However, if you just want to store the entire file in a single array, then `fread` as @Barmar shows is the best approach. (there is a trick also, allocate `size+1` bytes and set `text_array[size] = 0;` and you can treat the entire thing as a string) Before long -- you'll just be a Penguin... – David C. Rankin Aug 11 '20 at 07:21
  • What does it mean large? 1MB , 10MB, 1GB, 100GB? – 0___________ Aug 11 '20 at 07:49
  • 1
    Hey everyone, thanks for all your help! Sorry it's taken me so long to reply. To answer a few questions this is for an assignment and I'm required to use fgets unfortunately. David C. Rankin I'm going to give your solution a go too it seem to achieve what I'm after with fgets. Finally by 'large' it's a 6.4MB text file so not huge but a considerable amount of text to handle. – PetrifiedPenguin Aug 11 '20 at 23:17

2 Answers2

3

There's no need to call fgets() in a loop. You know how big the file is, just read the entire thing into text_array with one call:

fread(text_array, 1, size, file);

However, if you want to treat text_array as a string, you need to add a null terminator. So you should add 1 when calling malloc().

Another problem is len = sizeof(text_array) / sizeof(text_array[0]). text_array is a pointer, not an array, so you can't use sizeof to get the amount of space it uses. But you don't need to do that, since you already have the space in the size variable.

There's no need to print text_array in a loop.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define LINELEN 74

int main(void) {

    FILE *file;
    char filename[] = "large.txt";
    int count = 0, i = 0, len;

/* Open the file */
    file = fopen(filename, "r");
    if (file == NULL) {
        printf("Cannot open file");
        return -1;
    }
    
/* Get size of file for memory allocation */
    fseek(file, 0, SEEK_END);
    size_t size = ftell(file);
    fseek(file, 0, SEEK_SET);
    
/* Allocate memory to the array */
    char *text_array = (char*)malloc(size*sizeof(char) + 1);
    
/* Store the information into the array */
    fread(text_array, 1, size, file);
    text_array[size] = '\0';
    printf("%s, text_array);

/* return array */
    return EXIT_SUCCESS;
}
Barmar
  • 741,623
  • 53
  • 500
  • 612
1

This part

while(fgets(&text_array[count], LINELEN, file) != NULL) {
  count++;
}

is problematic.

If the loop is un-rolled it's "kind of like":

fgets(&text_array[0], LINELEN, file)
fgets(&text_array[1], LINELEN, file)
fgets(&text_array[2], LINELEN, file)

So you only advance the fgetsdestination buffer by a single char between each fgets call. If we assume the fgets reads more than a single character, the second fgets overwrites data from the first fgets. The third fgets overwrites data from the second and so on.

You need to advance the buffer with as many characters as fgets actually read or use another way of reading, e.g. fread.

Support Ukraine
  • 42,271
  • 4
  • 38
  • 63
  • So because each line overwrites most of the last that's why I'm getting a series of random incoherent characters instead of a body of text? You say that 'If we assume the fgets reads more than a single character". If that's so, and I'm overwriting characters, how do I use fgets to take the whole body of text? I've tried changing it to: while(fgets(&text_array[count], 1, file) != NULL) thinking it would then read one char at a time and input it into the array but instead I get a segmentation fault but the memory I malloc should be plenty shouldn't it? Thanks for your answer. – PetrifiedPenguin Aug 11 '20 at 23:11