2

So i have this code

FILE* file = fopen("file.txt", "r");
if(file == NULL)
{
    printf("Failed to open file.\n");
    return NULL;
}

fseek(file, 0L, SEEK_END);
long bufferSize = ftell(file);
fseek(file, 0L, SEEK_SET);

char* buffer = (char*) malloc(bufferSize);
if(buffer == NULL)
{
    printf("Failed to allocate memory for buffer.\n");
    return NULL;
}

fread(buffer, sizeof(char), bufferSize, file);
fclose(file);

This seems to work perfectly fine when printing to console with printf("%s", buffer) but i am wondering if this should be causing a buffer overflow or if its wrong since there seemingly isnt a null terminator character at the end. Lets assume that the file.txt has exactly 4 characters in it. When the bufferSize is calculated it will be a long with the value of 4. So when i am calling malloc(bufferSize) I am creating a buffer with a size of 4 bytes which does not account for a null terminator character. Everywhere i have seen examples of people reading an entire text file they use code like this but shouldnt this be creating a char* with the characters from the file without an ending null terminator character? should i be allocating this buffer using malloc(bufferSize + 1) and adding a null terminator character?

Teight
  • 37
  • 3
  • 2
    As you've correctly wondered, just using `printf` is not a good idea because you don't know whether a `\0` was read at all or not. So I would allocate `bufferSize+1` bytes and manually set `buffer[bufferSize] = 0`, then you can safely use `printf`. – Pablo Apr 03 '20 at 01:07
  • OT: regarding: `printf("Failed to open file.\n");` error messages should be output to `stderr`, not `stdout`. When the error indication is from a C library function should also output the text reason the system thinks the error occurred. Suggest: `perror( "Failed to open file." );` – user3629249 Apr 03 '20 at 02:02

2 Answers2

3

This seems to work perfectly fine when printing to console with printf("%s", buffer)

Seem to working perfectly fine is a perfect manifestation of undefined behavior.

should i be allocating this buffer using malloc(bufferSize + 1) and adding a null terminator character?

If you wish to use %s printf format specifier with the pointer to a consecutive bytes of printable characters, these bytes need to be terminated with a zero byte. Or the other way, %s printf format specifier needs a zero terminated sequence of bytes. Otherwise, undefined behavior happens.

So:

  • Your input file contains a zero byte, so that %s stops outputting there.
  • You need to supply a zero terminating byte by yourself, to make sure that %s knows where to stop.
  • Or you can iterate over the bytes yourself for (...) { printf("%c", buffer[i]); } or (assuming bufferSize is lower then INT_MAX, so probably is) just tell printf when to stop by specifying the precision of the format specifier, like: printf("%.*s", (int)bufferSize, buffer);

or undefined behavior will happen.

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • What about allocating 1 extra byte for that null terminator but using `calloc()` to initialize it all to 0? – NTDLS Apr 03 '20 at 02:35
  • 1
    Its not entirely the same. It's somewhat less efficient as it is initializing bytes that are likely about to be written anyway. Then there's the fact that it reduces the line count by 1 since you don't require `n[sz-1] = '\0';` That last part is my fav because it makes code more readable and as a practice makes code safer. IMHO – NTDLS Apr 03 '20 at 04:26
-1

Depending upon the size of buffer you allocate and the size of the allocation unit your OS provides, there are often extra bytes at the end of the allocation. Which means that depending how you later use the memory, an exact buffer allocation may lead to fail, or there may be spare byte(s) at the end of the allocation, which your fread() would not overwrite. The result? You may test your program with files that have serendipitous sizes, but programs may fail intermittently once shipped.

Quick fix? Always allocate a bit more space at the end of your buffer - depending upon how your program interprets the bytes (char, short, int, long, long long, struct).

Note that the size of the allocation unit is less likely to save you if the string is nested in a struct, where struct elements are snuggled close together. But odd sized strings would still have spare space, depending upon compiler flags.

Note that your specific usage is finding the end of the file, and slurping the entire file into memory. Likely your OS provides memory in 16, 32, or 64 byte chunks. Which means that you have 1/16, 1/32, or 1/64 chance of accidentally strolling off the end of your allocated buffer.

Suggestions: (0) Always allocate extra padding, to cushion running into walls. (1) Consider using fstat() rather than ftell()? (2) Consider memory mapping the file, rather than using malloc/free and fread.

ChuckCottrill
  • 4,360
  • 2
  • 24
  • 42