Getting size of dynamic C-style array vs. use of delete[]. Contradiction?

Question

I read everywhere that in C++ it is not possible to get the size of a dynamic array just from the pointer pointing to that chunk of memory.

How is it possible that there is no way of getting the size of a dynamic array just from the pointer, and at the same time it is possible to free all the memory allocated by using delete [] just on the pointer, without the need of specifying the array size?

delete [] must know the size of the array, right? Therefore this information must exist somewhere. Shouldn't it?

What is wrong in my reasoning?

The underlying memory allocator knows the size of blocks it allocates, but there's no standard function that returns that information. There are sometimes non-standard malloc implementation specific ways but they're inherently non portable. — Shawn, Feb 20 '19 at 10:20
Why is this ever interesting? Use `std::vector` and never worry about `delete[]`. — n. m. could be an AI, Feb 20 '19 at 10:21
This is because memory allocator, which implement the malloc/free function. Operators new and delete usually just a wrapper on top of malloc and free (vise versa is not allowed by the standard). Memory allocator it self know the size of the block. — Victor Gubin, Feb 20 '19 at 10:25
The reason of such behavior - C and C++ are close to assembly language. Using C or C++ you can wrote some machine specific code like operating system kernels or device drivers, such code may use memory blocks which were not stored in heap or stack. I.e. IBM PC text mode video buffer is located in the 0xB800 address, so you can use it like `static uint16_t* VB = reinterpret_cast(0xB800)`, and then you can read and write into video buffer with the same way as you working with the heap array. With another languages it can be harder or impossible. — Victor Gubin, Feb 20 '19 at 10:35
@n.m. I don't get answers that tell you "don't bother". If one asks a question, there are reasons behind it. And the reasons don't have necessarily to be "practical reasons". It might be curiosity, it might be that I want to implement a compiler in machine code, or whatever. Even if we suppose one should always be backed by practical reasons in order to ask questions, I'm sure it is always possible to come up with at least one or two cases in which new knowledge could be used. There might be cases in which I cannot or don't want to use STL. — Michele Piccolini, Feb 20 '19 at 10:42
"If one asks a question, there are reasons behind it." These reasons are not necessarily valid. If you want to learn how to use C++ efficiently then not bothering with delete and pointers is the right thing. They are too low level and there are gazillion of these low level things of zero importance. Trying to learn them all is a waste of time. If you are in a situation where you cannot use `std::vector` (which should be *extremely* rare) you might want to look inside a typical implementation to learn how it works in order to replicate some of it. — n. m. could be an AI, Feb 20 '19 at 11:10
Anyway there's a good answer now, I shall be grateful if you tell me if it's helpful. — n. m. could be an AI, Feb 20 '19 at 11:10
@n.m. I don't think SO is limited to people "learning how to use C++ efficiently". Otherwise the language-lawyer tag would be pretty useless. _Someone_ has to deal with all these "low level things", so why is asking about them on SO unreasonable? — Max Langhof, Feb 20 '19 at 11:50
@MaxLanghof SO is not limited this way, but IME this sort of questions is more frequently asked by people who are at their "learning to use the language" phase, rather at the "implementing the language and/or language lawyering" phase. I could be mistaken of course. — n. m. could be an AI, Feb 20 '19 at 12:10
@MichelePiccolini: n.m. is really just pointing out that C++ more or less views `new` and `delete` mechanics as necessary evils, that were only required in the early days of the language. In modern C++, their use is discouraged, in favor of smart pointers and containers. `new` and `delete` should only be required for low level projects, such as implementing new framework components, which should imply the authors are already C++ (or at least OOP) experts. — jxh, Feb 20 '19 at 21:19
@n.m. It isnt that rare. In my career I have come across a number of embedded chips with a C++03 or C++11 unhosted implementation with no std::vector — Vality, Feb 21 '19 at 17:46
@Vality If an implementation has functioning `new` and `delete` it should provide standard containers. Being "embedded' is a rather poor excuse for not doing that. — n. m. could be an AI, Feb 21 '19 at 19:19
@n.m. Regardless of what should be, I can confirm there are a substantial number of implementations that do not. I did not develop these platforms, merely used them. — Vality, Feb 21 '19 at 19:40

Handy999 · Accepted Answer · 2019-02-21T07:08:46.997

TL;DR The operator delete[] destructs the objects and deallocates the memory. The information N ("number of elements") is required for destructing. The information S ("size of allocated memory") is required for deallocating. S is always stored and can be queried by compiler extensions. N is only stored if destructing objects requires calling destructors. If N is stored, where it is stored is implementation-dependent.

The operator delete [] has to do two things:

a) destructing the objects (calling destructors, if necessary) and

b) deallocating the memory.

Let's first discuss (de)allocation, which is delegated to the C functions malloc and free by many compilers (like GCC). The function malloc takes the number of bytes to be allocated as a parameter and returns a pointer. The function free takes only a pointer; the number of bytes is not necessary. This means that the memory allocating functions have to keep track how many bytes have been allocated. There could be a function to query how many bytes have been allocated (in Linux this can be done with malloc_usable_size, in Windows with _msize). This is not what you want because this does not tell you the size of an array but the amount of memory allocated. Since malloc is not necessarily giving you exactly as much memory as you have asked for, you cannot compute the array size from the result of malloc_usable_size:

#include <iostream>
#include <malloc.h>

int main()
{
    std::cout << malloc_usable_size(malloc(42)) << std::endl;
}

This example gives you 56, not 42: http://cpp.sh/2wdm4

Note that applying malloc_usable_size (or _msize) to the result of new is undefined behavior.

So, let's now discuss construction and destruction of objects. Here, you have two ways of delete: delete (for single objects) and delete[] (for arrays). In very old versions of C++, you had to pass the size of the array to the delete[]-operator. As you mentioned, nowadays, this is not the case. The compiler tracks this information. GCC adds a small field prior the beginning of the array, where the size of the array is stored such that it knows how often the destructor has to be called. You might query that:

#include <iostream>

struct foo {
    char a;
    ~foo() {}
};

int main()
{
    foo * ptr = new foo[42];
    std::cout << *(((std::size_t*)ptr)-1) << std::endl;
}

This code gives you 42: http://cpp.sh/7mbqq

Just for the protocol: This is undefined behavior, but with the current version of GCC it works.

So, you might ask yourself why there is no function to query this information. The answer is that GCC doesn't always store this information. There might be cases where destruction of the objects is a no-operation (and the compiler is able to figure that out). Consider the following example:

#include <iostream>

struct foo {
    char a;
    //~foo() {}
};

int main()
{
    foo * ptr = new foo[42];
    std::cout << *(((std::size_t*)ptr)-1) << std::endl;
}

Here, the answer is not 42 any more: http://cpp.sh/2rzfb

The answer is just garbage - the code was undefined behavior again.

Why? Because the compiler does not need to call a destructor, so it does not need to store the information. And, yes, in this case the compiler does not add code that keeps track how many objects have been created. Only the number of allocated bytes (which might be 56, see above) is known.

Very interesting answer! I do wonder now what `malloc_usable_size` would give you for that second pointer though? And if it's the correct answer, where this information would have been stored? — Max Langhof, Feb 20 '19 at 13:22
You can try this out but this is undefined behavior. For gcc, where `new` is based on `malloc`, it could work. However, if you consider an array of a not-trivally-destructible type (first example), `ptr` does for sure not point to the beginning of the memory that has been allocated by malloc (because of the hidden field in the beginning). Thus, I expect that `malloc_usable_size` does not work. — Handy999, Feb 20 '19 at 13:28
Wonderful answer! So, the gist is: * The information N ("number of elements") may not be stored (it is only in case of dynamic arrays of things with a destructor). For this reason we need to keep track of the length manually, if we want it. * What is instead always stored is S ("size of allocated memory"). (S can be queried but results are implementation-dependent). * `delete[]` only needs S in order to deallocate memory (treats everything like a blob). It needs N only to know how many times to call destructors. * If N is stored, where it is stored is implementation-dependent. Correct? — Michele Piccolini, Feb 20 '19 at 17:22
@Handy999 could you maybe add my summary as a tl;dr; at the top of your answer, so that other people can get an immediate answer when reading this, and have the possibility of getting more details by reading further. And maybe cite _msize for Windows, by citing Lightness Races in Orbit. This way, we have all the information in one place, and I can accept the answer! — Michele Piccolini, Feb 20 '19 at 22:18

Lightness Races in Orbit · Answer 2 · 2019-02-20T10:28:42.727

26

It does - the allocator, or some implementation detail behind it, knows exactly what the size of the block is.

But that information is not provided to you or to the "code layer" of your program.

Could the language have been designed to do this? Sure! It's probably a case of "don't pay for what you don't use" — it's your responsibility to remember this information. After all, you know how much memory you asked for! Often times people will not want the cost of a number being passed up the call stack when, most of the time, they won't need it to be.

There are some platform-specific "extensions" that may get you what you want, like malloc_usable_size on Linux and _msize on Windows, though these assume that your allocator used malloc and didn't do any other magic that may extend the size of the allocated block at the lowest level. I'd say you're still better off tracking this yourself if you really need it… or using a vector.

edited Feb 20 '19 at 10:28

answered Feb 20 '19 at 10:19

Lightness Races in Orbit

378,754
76
643
1,055

I don't get your reasoning for "don't pay for what you don't use". If _I_ have to keep track of it, then I have to pay extra at some point (such as passing a size up and down the call stack), whereas the implementation already has to know the allocation size given just a pointer. A language function for returning this information (which must have been stored somewhere anyway) for a given pointer (with UB if it doesn't point to the beginning of such an allocation) should be free, no? – Max Langhof Feb 20 '19 at 10:23
4

@MaxLanghof _"If I have to keep track of it, then I have to pay extra at some point"_ The key word is **if**. – Lightness Races in Orbit Feb 20 '19 at 10:26
@MaxLanghof _"A language function for returning this information"_ Well, okay, maybe. That would be opt-in and thus arguably not an expense. There could also be implementation/design constraints of which I'm not aware. Either way the answer is "yes, the computer knows this, but it doesn't tell you". – Lightness Races in Orbit Feb 20 '19 at 10:26
And that's exactly what I would be curious about - what are these implementation/design constraints (or implementation alternatives where it is not free)? Granted, in idiomatic modern C++ this is technically irrelevant because we don't dabble in raw arrays anyway, but I don't see how you can have an abstraction that beats (or easily matches) `delete[]` in terms of space cost - the STL doesn't have any for runtime-sized arrays. --- Well, your edits do address some of this. – Max Langhof Feb 20 '19 at 10:29
4

@MaxLanghof Admittedly I would be vaguely interested in finding out whether there's some crucial reason that this wasn't made part of the stdlib's allocation interface(s), that go beyond "we cba and don't think you should need this" – Lightness Races in Orbit Feb 20 '19 at 10:32
One use case I see is the `delete []` operator. Usually one has the overhead of having to remember to use `delete []` instead of `delete` on pointers pointing to arrays. Accidentally just using `delete` would cause bugs. Then wouldn't it make more sense to have just one operator `delete` that was overloaded to free whole arrays in case it is used on such pointers? (Or freeing the first element of an array could be a desirable feature?) Or, weird example: a factory returning a pointer to a random-sized array. It would be useful to know the size, without bloating the interface to pass the size. – Michele Piccolini Feb 20 '19 at 10:56
1

@MichelePiccolini The first question in your comment is unrelated to the topic at hand - correctly `delete[]`ing a pointer to an array if you already _know_ it is a pointer to an array and being able to infer whether a pointer points to an array or a single object are two different, unrelated things. In other words, if there was an operator to infer the size that `delete[]` uses it would most certainly have UB if used on non-array allocations, thus it wouldn't allow you to decide whether the pointer is to an array allocation or not. – Max Langhof Feb 20 '19 at 11:45
2

@MichelePiccolini You shouldn't really be using `delete` or `delete[]` _at all_, though; that's for the insides of containers and smart pointers, which do have all this information nicely tucked away. – Lightness Races in Orbit Feb 20 '19 at 11:57
2

It might not know *exactly*, it might only know the size of the block it chose, which might tell it the size rounded up to the next multiple of 16 bytes, for example. Depending on the allocator of course. xD, @Handy999 already posted this as an answer before I commented. – Peter Cordes Feb 20 '19 at 17:38
*"it's your responsibility to remember this information"* - I'd like to concur. This is not (only) about *remembering* things. Method signatures like `doSomething(int *array)` are basically *always* useless, and need another `, int length)` parameter. Having to track this is clumsy and error-prone, to say the least. (Sure, now one could argue, "This parameter should be a vector reference with some const's thrown in, and length should be a `size_t`" (or whatever), but ... that wouldn't lead anywhere except to a flamewar, because at some point, I'd *have* to mention Java ;-) – Marco13 Feb 20 '19 at 19:34
"It does - the allocator, or some implementation detail behind it, knows exactly what the size of the block is." It's true that the allocator remembers the *block size* but that's rarely the same as the actual asked for size. I can't imagine any production level allocator that doesn't bin allocations into specific groups for performance benefits (and the pesky alignment requirements further complicated things). – Voo Feb 20 '19 at 20:20
This isn't correct: `delete []` *doesn't* know the size of the array. It knows the size of the memory allocation block that the array was placed in, which simply gives an upper bound to the array size. Depending on the allocation strategy involved, this could be off by quite a bit, if the allocation was rounded to something like the nearest memory page (4 kb on most systems). – Mark Feb 21 '19 at 00:10
1

@Marco13: There are all the nice features in the standard library (like containers or in C++20 even ranges), which do exactly the right thing. You do not have to go to Java to get this. C++ has all you need but also much legacy which you can just decide not to use. – Handy999 Feb 21 '19 at 08:49
Okay, we can nitpick over exactly what information is kept behind the scenes even though the whole point of the answer is that _you can't get to that information_. – Lightness Races in Orbit Feb 21 '19 at 11:17

score 6 · Answer 3 · edited Feb 20 '19 at 19:01

I think the reason for this is a confluence of three factors.

C++ has a "you only pay for what you use" culture
C++ started its life as a pre-processor for C and hence had to be built on top of what C offered.
C++ is one of the most widely ported languages around. Features that make life difficult for existing ports are unlikely to get added.

C allows the programmer to free memory blocks without specifying the size of the memory block to free, but does not provide the programmer with any standard way to access the size of the allocation. Furthermore the actual amount of memory allocated may well be larger than the amount the programmer asked for.

Following the principle of "you only pay for what you use", C++ implementations implement new[] differently for different types. Typically they only store the size if it is necessary to do so, usually because the type has a non-trivial destructor.

So while yes, enough information is stored to free the memory block, it would be very difficult to define a sane and portable API for accessing that information. Depending on the data type and platform, the actual requested size may be available (for types where the C++ implementation has to store it), only the actual allocated size may be available (for types where the C++ implementation does not have to store it on platforms where the underlying memory manager has an extension to get the allocated size), or the size may not be available at all (for types where the C++ implementation does not have to store it on platforms that don't provide access to the information from the underlying memory manager).

score 2 · Answer 4 · answered Feb 20 '19 at 10:25

2

This answer applies to Microsoft Visual Studio only.

There is a function called _msize, which will return the malloced / calloced / realloced size of a pointer.

It can be found in the malloc.h header, and the parameters are:

size_t _msize(
   void *memblock
);

I am not sure if there is an equivalent in gcc. There probably should be.

answered Feb 20 '19 at 10:25

Owl

1,446
14
20

1

Ahh there is malloc_usable_size() for linux. – Owl Feb 20 '19 at 10:42
Do you know if those are reliable or what could be some cases that might arise and might make the usage of these non reliable? – Michele Piccolini Feb 20 '19 at 11:01
2

Well one of the things I noticed when I tested it, I called a malloc on an array of chars 25 long, then called malloc_usable_size on the array. It reported 40 bytes used. This suggests to me that it's actually reporting the container size allocated that is large enough to contain the array. – Owl Feb 20 '19 at 14:37

score -1 · Answer 5 · answered Feb 21 '19 at 01:24

If delete[] doesn't have to know the size of the array at the time it is called, your entire argument falls apart. And delete[] doesn't have to know the size of the array at the time it is called. It only needs to know the size to make the block available for use by others, and absolutely nothing requires it to make the block available for use by others at the time delete[] is called.

For example, delete[] my dice a large block into some number of smaller blocks. Each of those blocks but one need only have a pointer to the control block that knows the size. If any block but the control block is passed to delete[] first, then delete[] has no idea how big the block that was just freed is and won't know until later.

That it is not absolutely required that delete[] know the size of a block at every arbitrary point during its lifetime is sufficient to invalidate your argument.

Getting size of dynamic C-style array vs. use of delete[]. Contradiction?

5 Answers5