50

I know that it's a common convention to pass the length of dynamically allocated arrays to functions that manipulate them:

void initializeAndFree(int* anArray, size_t length);

int main(){
    size_t arrayLength = 0;
    scanf("%d", &arrayLength);
    int* myArray = (int*)malloc(sizeof(int)*arrayLength);

    initializeAndFree(myArray, arrayLength);
}

void initializeAndFree(int* anArray, size_t length){
    int i = 0;
    for (i = 0; i < length; i++) {
        anArray[i] = 0;
    }
    free(anArray);
}

but if there's no way for me to get the length of the allocated memory from a pointer, how does free() "automagically" know what to deallocate when all I'm giving it is the very same pointer? Why can't I get in on the magic, as a C programmer?

Where does free() get its free (har-har) knowledge from?

Chris Cooper
  • 17,276
  • 9
  • 52
  • 70
  • 2
    Also note that `int length` is wrong. Array lengths and offsets and type sizes, and other such things are of the type `size_t`, which is defined in the `stddef.h`, `stdio.h`, `stdlib.h`, and `string.h` headers. The main difference between `size_t` and `int` is that `int` is signed and `size_t` is unsigned, but on some (e.x. 64-bit) platforms they may also be different sizes. You should always use `size_t`. – Chris Lutz Apr 16 '10 at 06:21
  • @Chris Lutz: Thanks. I'll make that change. I've seen the "_t" suffix around a lot. What does it signify? "Type"? As in "size_type?" What other examples are there? – Chris Cooper Apr 16 '10 at 10:52
  • 1
    Yes, it stands for type. There are a lot of other examples, including `int32_t`, `regex_t`, `time_t`, `wchar_t`, etc. – Matthew Flaschen Apr 16 '10 at 11:52
  • 3
    A type with `_t` suffix is a type that is not a fundamental type of the language. Which means that `size_t` is unsigned int, unsigned long, or something like that. – u0b34a0f6ae May 15 '10 at 17:19

9 Answers9

34

Besides Klatchko's correct point that the standard does not provide for it, real malloc/free implementations often allocate more space then you ask for. E.g. if you ask for 12 bytes it may provide 16 (see A Memory Allocator, which notes that 16 is a common size). So it doesn't need to know you asked for 12 bytes, just that it gave you a 16-byte chunk.

Neuron
  • 5,141
  • 5
  • 38
  • 59
Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
  • But what about C++? In C++ the runtime knows the actual size when allocating with `new type[n]` as it calls n constructors for `delete []`? – Viktor Sehr Apr 16 '10 at 11:25
  • 1
    @Viktor It doesn't need to store the size for primitive types or types with a empty destructor. – Yacoby Apr 16 '10 at 11:29
  • @Yacoby: true, still it doesn't answer my question – Viktor Sehr Apr 16 '10 at 12:06
  • Viktor, so your question is why doesn't C++ doesn't provide a way to get the number of elements in an array, when all the elements have non-empty destructors? Probably because it would be somewhat confusing (you have to know implementation details for a class to know if the function is safe), and isn't a necessary feature. But this deserves its own question (which may already exist). – Matthew Flaschen Apr 16 '10 at 12:28
  • @ViktorSehr: If an implementation allocates more storage than requested and can determine by whatever means that calling a destructor upon that storage will have no visible side-effects, it is not required to keep track of the actual allocation. Having a function to request the actual size of an allocation but not having it work in cases where a compiler determined that it didn't need to keep that information would be very confusing. – supercat Oct 14 '16 at 18:20
  • @ViktorSehr A design principle of C++ is zero overhead. If you don't use a feature, you don't pay for it. If I make an array of ints, and pass that array of ints to a shared library, and neither my code nor the shared library calls this hypothetical arraySize() function, the size of the array should not be stored. Yet the compiler has no way of knowing whether my shared library calls arraySize(). Also, std::vector solves the real problem. – James Hollis Jun 20 '18 at 19:26
19

You can't get it because the C committee did not require that in the standard.

If you are willing to write some non-portable code, you may have luck with:

*((size_t *)ptr - 1)

or maybe:

*((size_t *)ptr - 2)

But whether that works will depend on exactly where the implementation of malloc you are using stores that data.

R Samuel Klatchko
  • 74,869
  • 16
  • 134
  • 187
10

After reading Klatchko's answer, I myself tried it and ptr[-1] indeed stores the actual memory (usually more than the memory we asked for probably to save against segmentation fault).

{
  char *a = malloc(1);
  printf("%u\n", ((size_t *)a)[-1]);   //prints 17
  free(a);
  exit(0);
}

Trying with different sizes, GCC allocates the memory as follows:

Initially memory allocated is 17 bytes.
The allocated memory is atleast 5 bytes more than requested size, if more is requested, it allocates 8 bytes more.

  • If size is [0,12], memory allocated is 17.
  • If size is [13], memory allocated is 25.
  • If size is [20], memory allocated is 25.
  • If size is [21], memory allocated is 33.
Community
  • 1
  • 1
N 1.1
  • 12,418
  • 6
  • 43
  • 61
  • Note that a different allocator might store the size somewhere else. Maybe a whole group of allocations share a 64 KB arena with the size stored in the first block of the arena. – Zan Lynx Mar 20 '12 at 21:48
  • This is a completely misleading answer. GCC doesn't allocate anything. You C library does that (or even the OS). Reverse engineering an implementation like this is *programming by experimentation*, which is bound to fail left right and center. – Jens May 07 '14 at 13:27
9

While it is possible to get the meta-data that the memory allocator places preceding the allocated block, this would only work if the pointer is truly a pointer to a dynamically allocated block. This would seriously affect the utility of function requiring that all passed arguments were pointers to such blocks rather than say a simple auto or static array.

The point is there is no portable way from inspection of the pointer to know what type of memory it points to. So while it is an interesting idea, it is not a particularly safe proposition.

A method that is safe and portable would be to reserve the first word of the allocation to hold the length. GCC (and perhaps some other compilers) supports a non-portable method of implementing this using a structure with a zero length array which simplifies the code somewhat compared to a portable solution:

typedef struct
{
    size_t length ;
    char alloc[0] ;   // Compiler specific extension!!!
} tSizedAlloc ;

// Allocating a sized block
tSizedAlloc* blk = malloc( sizeof(tSizedAlloc) + length ) ;
blk->length = length ;

// Accessing the size and data information of the block
size_t blk_length = blk->length ;
char*  data = blk->alloc ;
Clifford
  • 88,407
  • 13
  • 85
  • 165
4

I know this thread is a little old, but still I have something to say. There is a function (or a macro, I haven't checked the library yet) malloc_usable_size() - obtains size of block of memory allocated from heap. The man page states that it's only for debugging, since it outputs not the number you've asked but the number it has allocated, which is a little bigger. Notice it's a GNU extention.

On the other hand, it may not even be needed, because I believe that to free memory chunk you don't have to know its size. Just remove the handle/descriptor/structure that is in charge for the chunk.

merinoff
  • 311
  • 3
  • 7
3

A non-standard way is to use _msize(). Using this function will make your code unportable. Also the documentation is not very clear on wheteher it will return the number passed into malloc() or the real block size (might be greater).

sharptooth
  • 167,383
  • 100
  • 513
  • 979
2

It's up to the malloc implementor how to store this data. Most often, the length is stored directly in front of the allocated memory (that is, if you want to allocate 7 bytes, 7+x bytes are allocated in reality where the x additional bytes are used to store the metadata). Sometimes, the metadata is both stored before and after the allocated memory to check for heap corruptions. But the implementor can as well choose to use an extra data structure to store the metadata.

swegi
  • 4,046
  • 1
  • 26
  • 45
  • 1
    I believe the length must be stored at the front. You need to know the size of the buffer to know where to find any trailing metadata. If the size is in the trailing metadata, you have a chicken/egg problem in getting to that data. – R Samuel Klatchko Apr 16 '10 at 06:05
  • Not 'must'. Conceivable an allocator could have a hashtable stored separately which maps pointers to sizes for free. Alternative: have a large number of arenas with various bucket sizes, allocation size can be determined based on whether the pointer falls within that arena's region – Demur Rumed Oct 10 '16 at 20:47
1

You can allocate more memory to store size:

void my_malloc(size_t n,size_t size ) 
{
    void *p = malloc( (n * size) + sizeof(size_t) );
    if( p == NULL ) return NULL;
    *( (size_t*)p) = n;
    return (char*)p + sizeof(size_t);
}

void my_free(void *p)
{
    free( (char*)p - sizeof(size_t) );
}

void my_realloc(void *oldp,size_t new_size)
{
    // ...
}

int main(void)
{
    char *p = my_malloc( 20, 1 );
    printf("%lu\n",(long int) ((size_t*)p)[-1] );
    return 0;
}
Neuron
  • 5,141
  • 5
  • 38
  • 59
Nyan
  • 2,360
  • 3
  • 25
  • 38
0

To answer the question about delete[], early versions of C++ actually required that you call delete[n] and tell the runtime the size, so it didn't have to store it. Sadly, this behaviour was removed as "too confusing".

(See D&E for details.)

me22
  • 651
  • 3
  • 8