1

While searching through this board for information about reading a full file into memory using C, I came across a use of fread() that I haven't seen before. I'm trying to understand it.

My questions are:

Is there a name/term for what is being done here?

What is happening when the size_t used is being added to the char *data and how is this considered a valid void *ptr by fread?

I'm going to put the code from the author's post in here and I'll link to the post as well. Unfortunately, the post is old, locked, and I don't have enough points here to leave a comment asking for clarification on it.

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>

/* Size of each input chunk to be
   read and allocate for. */
#ifndef  READALL_CHUNK
#define  READALL_CHUNK  262144
#endif

#define  READALL_OK          0  /* Success */
#define  READALL_INVALID    -1  /* Invalid parameters */
#define  READALL_ERROR      -2  /* Stream error */
#define  READALL_TOOMUCH    -3  /* Too much input */
#define  READALL_NOMEM      -4  /* Out of memory */

/* This function returns one of the READALL_ constants above.
   If the return value is zero == READALL_OK, then:
     (*dataptr) points to a dynamically allocated buffer, with
     (*sizeptr) chars read from the file.
     The buffer is allocated for one extra char, which is NUL,
     and automatically appended after the data.
   Initial values of (*dataptr) and (*sizeptr) are ignored.
*/
int readall(FILE *in, char **dataptr, size_t *sizeptr)
{
    char  *data = NULL, *temp;
    size_t size = 0;
    size_t used = 0;
    size_t n;

    /* None of the parameters can be NULL. */
    if (in == NULL || dataptr == NULL || sizeptr == NULL)
        return READALL_INVALID;

    /* A read error already occurred? */
    if (ferror(in))
        return READALL_ERROR;

    while (1) {

        if (used + READALL_CHUNK + 1 > size) {
            size = used + READALL_CHUNK + 1;

            /* Overflow check. Some ANSI C compilers
               may optimize this away, though. */
            if (size <= used) {
                free(data);
                return READALL_TOOMUCH;
            }

            temp = realloc(data, size);
            if (temp == NULL) {
                free(data);
                return READALL_NOMEM;
            }
            data = temp;
        }

        n = fread(data + used, 1, READALL_CHUNK, in);
        if (n == 0)
            break;

        used += n;
    }

    if (ferror(in)) {
        free(data);
        return READALL_ERROR;
    }

    temp = realloc(data, used + 1);
    if (temp == NULL) {
        free(data);
        return READALL_NOMEM;
    }
    data = temp;
    data[used] = '\0';

    *dataptr = data;
    *sizeptr = used;

    return READALL_OK;
}

Link: C Programming: How to read the whole file contents into a buffer

brokaryote
  • 13
  • 2
  • This is a very standard sort of technique. I'm not sure it has any single name. The basic idea is, call `malloc` to allocate some memory, start using it, if you run out of space, call `realloc` to make it bigger. Optionally, when you're done, if you've overallocated, call `realloc` one last time, to shrink the region a bit smaller, down to exactly what you need. – Steve Summit Apr 22 '22 at 20:24
  • What exactly is your question? What this function does? (Attempting to read a whole file into memory.) What `data + used` is? (Pointer arithmetics.) What the resulting type of that operation is? (`char *`) How that could be interpreted as a valid `void *`? (Any pointer can be converted to `void *` and back.) Where that points? (At the end of the data already read.) – DevSolar Apr 22 '22 at 20:24
  • Search for "pointer arithmetic" – William Pursell Apr 22 '22 at 20:27
  • 1
    I'm sorry if my original question was unclear but I know what the code is doing in general. @DevSolar mentioned that it is pointer arithmetic and now I have a better understanding of it. The poster is incrementing the pointer data by the amount used and reading the next chunk of the file into that location. Is that correct? – brokaryote Apr 22 '22 at 20:30
  • Instead of using `realloc()` for chunks of memory, use `fstat()` to find the size of the file & do a `malloc()` once, then load the file into memory. This way you can limit/check file-size beforehand without any memory allocation. – जलजनक Apr 22 '22 at 20:39
  • 2
    @SparKot What if you are reading from a pipe, or a network socket? – Steve Summit Apr 22 '22 at 20:42
  • @brokaryote Yes, that is correct. – user3386109 Apr 22 '22 at 20:46
  • 1
    @SparKot The code shown is standard C. `fstat` is POSIX. That might make a difference to some. – DevSolar Apr 22 '22 at 21:26

1 Answers1

1

What is happening when the size_t used is being added to the char *data and how is this considered a valid void *ptr by fread?

In practice(*), a pointer is just a number, which references an address in (virtual) memory. What's being done here is simple pointer arithmetic: You can add an integer to a pointer, which increases its value, so if your pointer pointed to address 1000 and you add 20, it now points to address 1020. Since used is always the number of bytes read so far, you point this many bytes into the data buffer.

But there's one more thing: This only works as described if the data type of the pointer has a size of 1 byte (as char does(*)). Because when you do pointer arithmetic, you don't increase the pointer by that many bytes, but really by multiples of the data type's size, so you always end up pointing to the start of an element in your array, and not somewhere in the middle if you're dealing with int. I.e. if you have int *x which points to address 1000, and you do x += 20, then x will point to address 1080 now, which is where x[20] would be located.

and how is this considered a valid void *ptr by fread?

Considering "pointers are just numbers", fread doesn't care how you arrived at that pointer value. As long as there is valid memory to write to, it will happily accept whatever you pass it.

(*) Assuming a modern architecture accessible by mere mortals.

Simon
  • 178
  • 1
  • 10