0

I am using this code to read a file:

char* fs_read_line(FILE* file)
{
   if (file == NULL) {
       return "CFILEIO: Error while reading the file: Invalid File";
   }

   long threshold = ftell(file);
   fseek(file, 0, SEEK_END);
   uint8_t* buffer = calloc(ftell(file)-threshold, sizeof(uint8_t));

   if(buffer == NULL)
      return;

   int8_t _;
   fseek(file, threshold, SEEK_SET);

   uint32_t ct = 0;
   while ((_ = (char)(fgetc(file))) != '\n' 
        && _ != '\0' &&  _ != '\r' && _ != EOF) {
       buffer[ct++] = _;
   }

   buffer = realloc(buffer, sizeof *buffer * (ct + 1)); 
   buffer[ct] = '\0';
   return buffer;
}

If the file is too big, I get (heap) overflow errors, probably because I initally allocate the file with the total amount of characters it contains.

an other way I tried to do this is by realloc the buffer after every iteration, but that's kinda not the approach I want.

Is there any way to dynamicly change the size of the array depending on the the current iteration without always uisng realloc ? or is there an way to determine how long the current line is by using ftell and fseek?

Lupor
  • 59
  • 8
  • 1
    The code you show won't compile in C++. Please don't add unrelated tags. – Some programmer dude Jan 15 '17 at 11:43
  • `sizeof(uint8_t*)` gives you the size of a pointer, not the size of a single byte. Which is almost totally certain just 1 on a system where `uint8_t` exists. So you allocate 4 or 8 times the size of the file, and then you don't check if the allocation succeeds. – Bo Persson Jan 15 '17 at 11:46
  • Rather than `memset`, `buffer[ct++] = _` is a lot clearer. – Martin Bonner supports Monica Jan 15 '17 at 11:46
  • There are also a few problems with your code, including the possibility that it returns a pointer t a string literal which you can not call `free` on, and you not having any kind of error checking for the `calloc` or `realloc` calls. The `realloc` call should not be needed anyway, which is good since you use it wrong. And why use `memset` to copy a single byte? Why not use e.g. `buffer[ct++] = _`? – Some programmer dude Jan 15 '17 at 11:47
  • @BoPersson sizeof(uint8_t) must be exactly one. It can't be *more* than one, because `uint8_t` must contain *exactly* 8 bits, and char must contain *at least* 8 bits. It can't be *less* than 1. Therefore it is equal to 1. – Martin Bonner supports Monica Jan 15 '17 at 11:49
  • @Someprogrammerdude yeah, how can I get rid of `realloc` ? and why am I using it wrong? it's the correct syntax, isn't it? – Lupor Jan 15 '17 at 11:53
  • @martin - There was [a discussion last week](http://stackoverflow.com/questions/41552514/is-overloading-on-all-of-the-fundamental-integer-types-is-sufficient-to-capture) about whether types like `uint8_t` could be extended integer types instead of one of the standard types. So I'm just a bit cautious, but agree that the size could not be 0.5 or anything. :-) – Bo Persson Jan 15 '17 at 11:55
  • 1
    Oh and remember that [`fgetc`](http://en.cppreference.com/w/c/io/fgetc) returns an `int` and not a `char`. – Some programmer dude Jan 15 '17 at 12:18
  • Do not change your post to reflect an answer - bad SO etiquette. Rollback to the previous version. – chux - Reinstate Monica Jan 15 '17 at 13:05
  • BTW: `(_ = (char)(fgetc(file)))` is a bad idea. Instead `int _; ... (_ = fgetc(file))` – chux - Reinstate Monica Jan 15 '17 at 13:07
  • 1) the `_` is a very poor choice for a variable name, amongst other things variable names beginning with `_` are reserved for the environment so should not be used in user programs. 2) The `_` is declared as a `char`, but `fgetc()` returns an `int` AND `EOF` is an `int` 3) the `sizeof()` is handled at compile/preproccess time, not runtime, so will not return a valid value. Suggest: saving the appropriate call to `ftell()` and using that value instead. 4) why bother to double the length of the allocated memory via `realloc()? Just allocate the needed amount at the call to `calloc()`. – user3629249 Jan 17 '17 at 04:04
  • You have both `return "string";` and `return;` in the function — the second is wrong. How is the poor calling code going to determine that `return "string";` and `return buffer;` are different — one indicating a problem and the other indicating success? You should probably be returning NULL for both the `return "string";` and `return;` cases. – Jonathan Leffler Mar 07 '17 at 21:34

3 Answers3

0

If your file can't fit in memory it can't fit in memory. You are allocating the memory buffer in advance, but you're making two mistakes that can cause you to allocate more than you need.

  1. You're starting at some arbitrary position in the file, but allocate memory as if you're starting at the beginning of the file. Allocate ftell(file) - threshold bytes.
  2. You are are allocation way too much memory. The sizeof(uint8_t *) should be sizeof(uint8_t) instead. You're allocating 4 or 8 times more memory than you should.

Other than that, what's the point of reallocating the buffer after you're done writing to it? The memory overflow has already happened. You should allocate before writing (inside the while loop). I don't see the point of reallocating at all, though, since you're allocating more than enough memory to begin with.

zmbq
  • 38,013
  • 14
  • 101
  • 171
  • for some reason if I delete the the `sizeof(uint8_t*)` at `realloc` I'll get junk bytes ^^ but I deleted the `sizeof(uint8_t*)` at the inital allocation and added `ftell(file) - threshold`, thank you! – Lupor Jan 15 '17 at 11:58
  • Why are you reallocating at all? – zmbq Jan 15 '17 at 12:00
  • to shrink down the size of the `buffer` ptr to the size of the actual number of bytes read, because I'm initally allocating the `buffer` with a amount of bytes that is bigger then the actual line^^ – Lupor Jan 15 '17 at 12:04
0

Code does not return a pointer to a string.

There is no null character in the returned buffer, so the calling code lacks the ability to know the length of the allocated memory. This certainly causes the calling code to error.

When re-allocating, add 1.

// buffer = realloc(buffer, ct * sizeof(uint8_t*));
//                                          v--- no star
buffer = realloc(buffer, ct * sizeof(uint8_t ) + 1);
buffer[ct] = '\0';

// or better
size_t ct = 0;
...
buffer = realloc(buffer, sizeof *buffer * (ct + 1));
buffer[ct] = '\0';

Is there any way to dynamically change the size of the array allocated memory depending on the the current iteration without always using realloc?

Array sizes cannot change. To dynamically change the size of the allocated memory requires realloc(). Note: the amount of needed memory could be determined before a memory allocation call.

or is there an way to determine how long the current line is by using ftell and fseek?

Like this code, you have found an upper bound to the current line's length. ftell and fseek do not locate the end of line.

Code could "seek" to the end of line with fscanf(file, "%*[^\n]"); or 1 beyond with a following fgetc(file).

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • by the way, do I have an memory leak by shrinking down the allocated memory without freeing it first? – Lupor Jan 15 '17 at 12:45
  • @Lupor No. But you may have leaks in the calling code - does it free the pointer returned by this function?. Note: once memory is free'd, code cannot use the pointer value. Code cannot re-allocate based on that pointer. – chux - Reinstate Monica Jan 15 '17 at 12:49
  • Yes, I always free the pointer returned by the function. thank you! but one last question: why I have to use `sizeof(uint8_t*)` instead of `sizeof(uint8_t)` when calling `realloc` ? why do I get junk-bytes when doin g `sizeof(uint8_t)` ? – Lupor Jan 15 '17 at 12:53
  • @Lupor Code should not need to use `uint8_t*`. Unclear how you determined you got "junk-bytes" whatever they are. IAC, suggest allocating the size of the referenced variable, not the type and append a null character. `buffer = realloc(buffer, sizeof *buffer * (ct + 1)); buffer[ct] = '\0';` – chux - Reinstate Monica Jan 15 '17 at 12:59
0

the following code:

  1. cleanly compiles
  2. performs the desired operation
  3. properly handles error conditions
  4. properly declares variable types
  5. properly returns a char* rather than a uint8_t*
  6. leaves open the question: why return 2x the needed buffer length
  7. the error message displayed when the passed in parameter is NULL is not correct. Suggest changing to indicate passed in file pointer was NULL
  8. the OPs posted code fails to check the returned value from each call to fseek() and fails to check the return value from each call to ftell() which it should be doing to assure the operation(s) was successful. I did not add that error checking in my answer so as to not clutter the code, however, it should be performed.

and now, the code:

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>


char* fs_read_line(FILE* file);


char* fs_read_line(FILE* file)
{
   if ( !file )
   {
       return "CFILEIO: Error while reading the file: Invalid File";
   }

   // implied else, valid parameter

   long threshold = ftell(file);
   fseek(file, 0, SEEK_END);

   char* buffer = calloc( (size_t)(ftell(file) - threshold) *2 +1, sizeof(char));
   if(buffer == NULL)
      return NULL;

   // implied else, calloc successful

   int ch;
   fseek(file, threshold, SEEK_SET);

   size_t ct;
   while ( (ch = fgetc(file)) != '\n'
        &&  ch != '\0'
        &&  ch != '\r'
        &&  ch != EOF)
   {
       buffer[ct++] = (char)ch;
   }

   return buffer;
} // end function: fs_read_line
user3629249
  • 16,402
  • 1
  • 16
  • 17