Consider this code to read a text based file. This sort of fread()
usage was briefly touched upon in the excellent book C Programming: A Modern Approach
by K.N. King.
There are other methods of reading text based files, but here I am concerned with fread()
only.
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
// Declare file stream pointer.
FILE *fp = fopen("Note.txt", "r");
// fopen() call successful.
if(fp != NULL)
{
// Navigate through to end of the file.
fseek(fp, 0, SEEK_END);
// Calculate the total bytes navigated.
long filesize = ftell(fp);
// Navigate to the beginning of the file so
// it can be read.
rewind(fp);
// Declare array of char with appropriate size.
char content[filesize + 1];
// Set last char of array to contain NULL char.
content[filesize] = '\0';
// Read the file content.
fread(content, filesize, 1, fp);
// Close file stream pointer.
fclose(fp);
// Print file content.
printf("%s\n", content);
}
// fopen() call unsuccessful.
else
{
printf("File could not be read.\n");
}
return 0;
}
There are some problems I have with this method. My opinion is that this is not a safe method of performing fread()
since there might be an overflow if we try to read an extremely large string. Is this opinion valid?
To circumvent this issue, we may use a buffer size and keep on reading into a char array of that size. If filesize is less than buffer size, then we simply perform fread()
once as described in the above code. Otherwise, We divide the total file size by the buffer size and get a result, whose int portion we will use as the total number of times to iterate a loop where we will invoke fread()
each time, appending the read buffer array into a larger string. Now, for the final fread()
, which we will perform after the loop, we will have to read exactly (filesize % buffersize) bytes of data into an array of that size and finally append this array into the larger string (Which we would have malloc
-ed with filesize + 1 beforehand). I find that if we perform fread()
for the last chunk of data using buffersize as its second parameter, then extra garbage data of size (buffersize - chunksize) will be read in and the data might become corrupted. Are my assumptions here correct? Please explain if/ how I have overlooked something.
Also, there is the issue that non-ASCII characters might not have size of 1 byte. In that case I would assume the proper amount is being read, but each byte is being read as a single char, so the text is distorted somehow? How is fread()
handling reading of multi-byte chars?