1

I have function which gets arguments: char* filedata (stores whole file) and FILE *fp (the opened file).

void read_file(char *filedata, FILE *fp){
  char buffer[1000];

  while(fgets(buffer, sizeof(buffer), fp))
  {
    char *new_str;
    if((new_str = malloc(strlen(filedata) + strlen(buffer)+1)) != NULL)
    {
      new_str[0] = '\0'; // ensures the memory is an empty string
      strcat(new_str, filedata);
      strcat(new_str, buffer);
    }
    else
    {
      printf("malloc failed!\n");
    }
    strcpy(filedata, new_str);
  }

  fclose(fp);
}

But this isn't too fast... is there faster way to read the whole file?

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
PiBu
  • 23
  • 4
  • You want to read all lines not just a simple read. please edit title if true. – Majid Hajibaba Jan 22 '22 at 12:13
  • 1
    "Fast" is not well defined. Fast to open a file? Fast to access the data in the file? What is fast enough? What are the requirements? – Cheatah Jan 22 '22 at 12:14
  • Looks very leaky to me:( – Martin James Jan 22 '22 at 12:24
  • 2
    The answer to your question: Yes. That said, beyond the amazing memory leaks, I don't suppose you've ever heard of [Shlemiel, The Painter](https://www.joelonsoftware.com/2001/12/11/back-to-basics/). And fyi, whoever wrote the spec for your function, they took a page from everything wrong with `gets` and manifested it in a file-read operation. E.g., a pointer for target data without any reference to how much space is actually there. That's a recipe for doom. It also makes all the rest of this pointless. Just keep dumping data into that target, advancing `filedata` as you go. Then... *don't*. – WhozCraig Jan 22 '22 at 12:32

3 Answers3

2

Below's my function illustrating how I usually do it. Not sure how fast it is compared to all other possible C implementations. But I'd imagine they're all pretty similar unless poorly programmed in one way or another, which might lead to slower, less efficient execution.

/* ==========================================================================
 * Function:    readfile ( FILE *ptr, int *nbytes )
 * Purpose:     read open file ptr into internal buffer
 * --------------------------------------------------------------------------
 * Arguments:   ptr (I)         FILE * to already open (via fopen)
 *                              file, whose contents are to be read
 *              nbytes (O)      int * returning #bytes in returned buffer
 * --------------------------------------------------------------------------
 * Returns:     ( unsigned char * )  buffer with ptr's contents
 * --------------------------------------------------------------------------
 * Notes:     o caller should free() returned output buffer ptr when finished
 * ======================================================================= */
/* --- entry point --- */
unsigned char *readfile ( FILE *ptr, int *nbytes ) {
  /* --- 
   * allocations and declarations
   * ------------------------------- */
  unsigned char *outbuff = NULL;        /* malloc'ed and realloc'ed below */
  int  allocsz=0, reallocsz=500000,     /*total #bytes allocated, #realloc */
       blksz=9900, nread=0,             /* #bytes to read, #actually read */
       buffsz = 0;                      /* total #bytes in buffer */
  /* ---
   * collect all bytes from ptr
   * ----------------------------- */
   if ( ptr != NULL ) {                 /* return NULL error if no input */
    while ( 1 ) {                       /* read all input from file */
      if ( buffsz+blksz + 99 >= allocsz ) { /* first realloc more memory */
        allocsz += reallocsz;           /*add reallocsz to current allocation*/
        if ( (outbuff=realloc(outbuff,allocsz)) == NULL ) /* reallocate */
          goto end_of_job; }            /* quit with NULL ptr if failed */
      nread = fread(outbuff+buffsz,1,blksz,ptr); /* read next block */
      if ( nread < 1 ) break;           /* all done, nothing left to read */
      buffsz += nread;                  /* add #bytes from current block */
      } /* --- end-of-while(1) --- */
    fclose(ptr);                        /* close fopen()'ed file ptr */
    } /* --- end-of-if(ptr!=NULL) --- */
  end_of_job:
    if ( nbytes != NULL ) *nbytes = buffsz; /* #bytes in outbuff */
    return ( outbuff );                 /* back to caller with output or NULL*/
  } /* --- end-of-function readfile() --- */
eigengrau
  • 153
  • 7
  • Why do you not allocate the entire file size to begin with? – einpoklum Jan 22 '22 at 13:53
  • 1
    @einpoklum The function's often reading output from popen("command","r") rather than a file, and my actual function contains a third arg _int isclose_ which leaves ptr open if 0, fclose's if 1, and pclose's if 2. So I could only know the entire size for an fopen'ed ptr, not a popen'ed one. – eigengrau Jan 22 '22 at 14:05
1

With some caveats, you can read the entire file into an appropriately-sized buffer in one fell swoop using the fread() function.

The following code outlines how to open the file, determine its size, allocate a buffer of that size, then read the file's data (all of it) into that buffer. But note the caveats about the fseek and ftell functions (discussed afterwards):

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    char* filename = "MyFile.txt"; // Or whatever
    FILE* fp = fopen(filename, "rb"); // Open in binary mode
    int seek = fseek(fp, 0, SEEK_END); // CAVEAT: Files in BINARY mode may not support SEEK_END ...
    if (seek != 0) {
        printf("Cannot fseek on binary file!\n");
        fclose(fp);
        return 1;
    }
    size_t filesize = (size_t)ftell(fp); // ... but this is not reliable if opened in TEXT mode!
    char* filedata = calloc(filesize + 1, 1); // Add 1 for the terminating "nul" character
    rewind(fp);
    fread(filedata, 1, filesize, fp); // Read whole file
    
    // Clean up ...
    fclose(fp);
    free(filedata);

    return 0;
}

Caveats:

Note that files opened in BINARY mode (as in the "rb" mode argument I gave in the fopen() call) are not required to support the SEEK_END origin in calls to fseek(); if this is the case on your platform, then this answer offers some alternatives to determine the file's size. From cppreference:

… Binary streams are not required to support SEEK_END, in particular if additional null bytes are output.

However, on the other hand, opening the file in TEXT mode (using "rt") will make the call to ftell effectively meaningless, in terms of the required size for your input buffer and the value specified to fread; from cppreference:

If the stream is open in text mode, the value returned by this function is unspecified and is only meaningful as the input to fseek().

Also note that, as pointed out in the comments, the fseek() and ftell() functions will fail if the size of the file is larger than the maximum value that can be stored in a long int variable; to handle such cases, you can use the (platform-dependent) 64-bit equivalents, as I described in an answer I posted some time ago.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
  • Also, `int seek = fseek(fp, 0, SEEK_END);` will have problems for files too large for the size to be contained in an `int`. – Andrew Henle Jan 24 '22 at 12:19
  • 1
    @AndrewHenle Yes, indeed. But the 64-bit versions of `fseek` and `ftell` are not (yet?) standardized across platforms/compilers. (Actually, it would be the `ftell` that causes the problem - the `fseek` function call wouldn't need/use the over-sized integer.) – Adrian Mole Jan 24 '22 at 12:26
  • @AndrewHenle Also note that `fseek` and `ftell` use **long int** offsets/sizes. But I added a note linking to the platform-specific ways of handling BIG files. – Adrian Mole Jan 24 '22 at 12:55
  • True, but `long` is only 4 bytes on Windows systems, even 64-bit ones. – Andrew Henle Jan 24 '22 at 12:59
1

Annotating your function (not mentioning the leaks, etc) and counting the operations on the character buffers:


void read_file(char *filedata, FILE *fp){
  char buffer[1000];

  while(fgets(buffer, sizeof(buffer), fp))      // <<-- NEW_SIZE
  {
    char *new_str;
    if((new_str = malloc(strlen(filedata)       // <<-- OLD_SIZE
         + strlen(buffer)                       // <<-- NEW_SIZE
        +1)) != NULL)
    {
      new_str[0] = '\0'; // ensures the memory is an empty string
      strcat(new_str, filedata);                // <<-- OLD_SIZE
      strcat(new_str, buffer);                  // <<-- OLD_SIZE + NEW_SIZE
    }
    else
    {
      printf("malloc failed!\n");
    }
    strcpy(filedata, new_str);                  // <<-- OLD_SIZE + NEW_SIZE
  }

  fclose(fp);
}

fgets(). strlen(), strcat() and strcpy() all need to loop over a character buffer. Only the fgets() is actually needed, the rest of the copying can be avoided.

Adding the number of passes over the buffers:

sum of operations per loop: 4 * OLD_SIZE + 4 * NEW_SIZE and: keep in mind that OLD_SIZE is actually SUM(NEW_SIZE), recursively, so your function has QUADRATIC behavior wrt the number of times the loop iterates.(basically the number of lines read)

So you end up with:

Number of times a character is inspected
    = 4 * N_LINE * LINE_SIZE
    + 8 * (NLINE * (NLINE-1) ) * LINE_SIZE
    ;

, which implies that for a 100 line file you need about 40K passes over the string(s).

[this is the "Schlemiel, the painter" story]

wildplasser
  • 43,142
  • 8
  • 66
  • 109