6

I have a pointer. I know its address (I got as an argument to a function), and I know that it points to a memory address previously allocated by the malloc() call.

Is there any way to know the size of this allocated memory block?

I would prefer a cross-platform, standard solution, but I think these do not exist. Thus, anything is okay, even hardcore low-level malloc data structure manipulation, if there is no better. I use glibc with x86_64 architecture and there is no plan to run the result elsewhere. I am not looking for a generic answer, it can be specific to glibc/x86_64.

I think, this information should be available, otherways realloc() could not work.


This question asks for a generic, standard-compliant solution, which is impossible. I am looking for a glibc/x86_64 solution which is possible, because the glibc is open source and the glibc realloc() needs this to work, and this question allows answers by digging in non-standard ways in the low-levels malloc internals.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
peterh
  • 11,875
  • 18
  • 85
  • 108
  • `x = malloc(HEREISTHESIZE);` – pmg Feb 11 '21 at 18:34
  • 3
    `malloc` does not have standard implementation, so this info *might* be available in non-portable ways on some known implementations. – Eugene Sh. Feb 11 '21 at 18:34
  • @pmg As the title says, I only know the memory address (x), and I do not know `HEREISTHESIZE`. – peterh Feb 11 '21 at 18:35
  • @EugeneSh. I think it should be, otherways `realloc()` could not work. – peterh Feb 11 '21 at 18:36
  • 3
    `realloc` knows exactly how `malloc` is implemented and is using the same internal bookkeeping. Moreover it does not have to. It just needs to `malloc` with new size, copy data there and `free` the old pointer. – Eugene Sh. Feb 11 '21 at 18:36
  • @EugeneSh. Yes. And, glibc being open source, also I can get some access to this internal bookkeeping. *"anything is okay, even hardcore low-level malloc data structure manipulation, if there is no better."*... – peterh Feb 11 '21 at 18:38
  • @EugeneSh. But `free()` needs to know the size. – Barmar Feb 11 '21 at 18:38
  • 1
    Sure you can. But if tomorrow glibc is changing the implementation, you can throw your code to trash. – Eugene Sh. Feb 11 '21 at 18:38
  • 1
    One possible scheme is that bookkeeping is allocated *before* the address in the pointer. The library would know how to get to the bookkeeping details by subtracting from the pointer value. There are more than one method to allocate memory. – Thomas Matthews Feb 11 '21 at 18:40
  • @peterh-ReinstateMonica, if you know only the address and not at least a lower bound on the size, then there are only a few things you can safely do with the addess, chiefly: store it in a variable or pass it to another function. Particular cases of the latter of special interest include `free`ing the pointer and `realloc`ing it. You cannot dereference it, and there is no standard function to determine the size of the allocated block. – John Bollinger Feb 11 '21 at 18:40
  • 5
    Why do you need this so badly that you're willing to use implementation-specific methods? Everyone else handles it by passing the size as an additional parameter or using a `struct` to hold the size and pointer. – Barmar Feb 11 '21 at 18:40
  • @Barmar Not necessarily. There are some "poor man" implementations without bookeeping at all. Such as https://www.freertos.org/a00111.html#heap_1 – Eugene Sh. Feb 11 '21 at 18:41
  • Also, the `malloc` and the `new` functions are allowed to allocate more space than requested. One reason would be for alignment purposes. Another could be that the memory allocation allocates the entire remaining memory and only trims as needed. So many more possibilities. – Thomas Matthews Feb 11 '21 at 18:42
  • 1
    There is no guarantee that the content of a pointer points to *heap* memory or a dynamically allocated object. Addresses of variables and constants can be passed to functions. – Thomas Matthews Feb 11 '21 at 18:44
  • 2
    There is no portable way; the standard simply doesn't mandate one. For glibc, look up `malloc_usable_size()` but be aware that it is mainly intended for debugging. Note that it can be greater than the size originally passed to `malloc`; most implementations don't record that size anywhere. – Nate Eldredge Feb 11 '21 at 18:45
  • 1
    @Barmar Because I am developing a glibc malloc hook. This hooked version will use the glibc malloc for small allocations, and my own implementation for large ones. Now the problem appears if I want to develop the hooked version of my `realloc`. Yes, I know, there are other ways to avoid this, this question looks now for the option if I really dig into the malloc internals. Thus, this question is not a dupe. – peterh Feb 11 '21 at 18:45
  • @EugeneSh. As the question explains, this is for the glibc malloc, which is not a poor mans implementation, and allows answers based on implementation-specific data structures. – peterh Feb 11 '21 at 18:46
  • @ThomasMatthews As the question explains, *"I know that it points to a memory address previously allocated by the malloc() call."*, thus undefined behavior is acceptable if it is not a previously malloc()-ed pointer. – peterh Feb 11 '21 at 18:46
  • A classic example is a pointer to an array. Most pointers point to the first element. There is no guarantee that there are consecutive slots or the quantity; that information cannot be gleamed from the pointer. – Thomas Matthews Feb 11 '21 at 18:47
  • @NateEldredge No problem if there is no portable way, *"I use glibc with x86_64 architecture and there is no plan to run the result elsewhere"*. I check malloc_usable_size(). – peterh Feb 11 '21 at 18:49
  • @peterh-ReinstateMonica: I'm playing the devil's advocate here. A function that is receiving a pointer can be passed a pointer to a variable, constant or invalid location. The function, for robustness, cannot guarantee that the pointer came from a `realloc`, `malloc` or `new`. This assumption is the basis for some hard to debug issues. – Thomas Matthews Feb 11 '21 at 18:50
  • @ThomasMatthews Well, I can understand you are thinking on the possible most generic way, but I already explained in the question, also only for you in a comment, that *"I know that it points to a memory address previously allocated by the malloc() call."*. Now I explain it third time. May I ask you to not make a fourth time needed again? – peterh Feb 11 '21 at 18:50
  • 1
    @peterh-ReinstateMonica, if you want your question to be taken as a non-dupe on account of seeking details of a specific implementation, then it would help to revise the question so that it -- or at this point maybe to ask a new one that -- clearly specifies exactly what you're looking for. Tagging [glibc] and adding protestations to the question text don't change the fact that both the title and the text of the question seem to be asking for a more generic answer. – John Bollinger Feb 11 '21 at 18:50
  • @peterh-ReinstateMonica `size_t malloc_usable_size (void *ptr);` seems to fulfill your needs. Anything else needed that it does not provide? – chux - Reinstate Monica Feb 11 '21 at 18:51
  • Your hook can simply add `sizeof(size_t)` to the allocation request, store the size in the first part of the allocation, and increment the pointer when returning to the caller. Then your `realloc()` and `free()` can subtract from the pointer to get the location of the size. – Barmar Feb 11 '21 at 18:52
  • Although that might not return a properly aligned result to the caller, so I take it back. – Barmar Feb 11 '21 at 18:53
  • @JohnBollinger Ok, I tried to edit the question as you suggested. – peterh Feb 11 '21 at 18:54
  • @chux-ReinstateMonica Yes, I think it qualifies as a possible answer. – peterh Feb 11 '21 at 18:55
  • @Barmar Thanks, also this is a possible option! – peterh Feb 11 '21 at 18:55
  • @peterh-ReinstateMonica: As I stated before, all you can guarantee, by the standard, is that an address is returned if successful or nullptr if not. The organization of memory and the exact algorithm for memory allocation is left for the implementation. For example, the `malloc` function can return an address inside a block set aside for fixed size blocks (this is usually most efficient for small sized allocations like 8-bit up to 32-bits). You could then have a bit array indicating which blocks are allocated. – Thomas Matthews Feb 11 '21 at 18:56
  • @Barmar Ok, but malloc() does not require an aligned return, as far I know. – peterh Feb 11 '21 at 18:56
  • @Barmar Yes, alignment issues do add to the simply solution. One can prefix a `union` of `size_t` and `max_align_t` – chux - Reinstate Monica Feb 11 '21 at 18:56
  • 1
    I still don't see why your hook needs this, though. You have to have a list somewhere of which allocations were handled by your implementation, and for those you must have recorded the sizes. When your `realloc` hook is called, if the pointer is in your list, you know its size. If it's not in your list, then all you can do with it anyway is to either pass it along to glibc's `realloc` if the new size is still small, or if the new size is large, allocate space with your implementation, copy, and pass the original pointer to glibc's `free`. You don't need to know the old size in either case. – Nate Eldredge Feb 11 '21 at 18:56
  • 1
    @peterh-ReinstateMonica All `*alloc()` return pointers meeting `max_align_t`. "The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement" – chux - Reinstate Monica Feb 11 '21 at 18:58
  • @NateEldredge Lists are not enough - in a chained list, you can search only linearly. In an arraylist, you can not add-remove at will. Some more complicated structure, for example, a balanced tree is needed - which mostly needs a malloc and I have a circular dependency problem. Furthermore, none of these are thread-safe. Creating a balanced tree entries before all the malloc-ations could work, but it would still not thread-safe. – peterh Feb 11 '21 at 22:46
  • @NateEldredge I need a primitive solution - this is a zero-cost project, it is more a home experiment for me, than a job task. The problem is that our VPS provider gave us fast and huge virtualized storage, and nearly zero RAM. I can not enable swap in the VPSes (it is a paravirtualized solution and nearly nothing works). So I implement a "poor man's malloc": malloc() calls create files in /tmp, and mmap() them... – peterh Feb 11 '21 at 22:50
  • @NateEldredge Surprisingly, no one did it until now, so I do. – peterh Feb 11 '21 at 22:52

1 Answers1

8

malloc_usable_size returns the number of usable bytes in the block of allocated memory pointed to by the pointer it is passed. This is not necessarily the original requested size; it is the provided size, which may be larger, at the convenience of the allocation software.

The GNU C Library apparently does not document this directly:

So I suppose you may take that last page as having the imprimatur of the GNU C Library. It says size_t malloc_usable_size(void *ptr) “returns the number of usable bytes in the block pointed to by ptr, a pointer to a block of memory allocated by malloc(3) or a related function,” and indicates the function is declared in <malloc.h>. Also, if ptr is null, zero is returned.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • 1
    Don't use this if you're a library. The executable is allowed to swap out `malloc()` by defining it and probably won't have a declaration for `malloc_usable_size`. – Joshua Feb 11 '21 at 18:56
  • Even if `malloc` has not been replaced, compilers are happy to treat any access past the actual size that was passed to `malloc` as undefined when making transformations, and most `malloc_usable_size` implementations return a somewhat larger value, making the function completely unsafe to use. The only exception I'm aware of is [my implementation in musl libc's mallocng](https://git.musl-libc.org/cgit/musl/tree/src/malloc/mallocng/meta.h?id=v1.2.2#n159) which tracks the exact size as part of trapping small overflows. – R.. GitHub STOP HELPING ICE Feb 11 '21 at 19:47
  • @R..GitHubSTOPHELPINGICE: What do you mean by "compilers are happy to treat..."? While there's usually a lot of special knowledge and communication between a C compiler and the standard C library that go beyond what the C language strictly mandates, C compilers have to mostly behave as though the pointer returned by malloc is to an unbound array of memory. The libc (or the libmalloc if it's separate) can make use of that area (magic values to notice write past end or making sure the end of the alloced area is against a page boundary with an unmapped page to force an invalid memory access). – nategoose Feb 11 '21 at 21:05
  • @nategoose: What do you think requires a compiler to treat the space returned by `malloc` as unbounded? The specification for `malloc` in C 2018 7.22.3 does not say that, and C implementations are entitled to rely on that in the absence of voluntary adopting other specifications. – Eric Postpischil Feb 11 '21 at 21:09
  • @EricPostpischil: If a C program is able to be built using an implementation of malloc other than the one that what the compiler is married to, then that implementation of malloc would have to be in charge what the area around the allocated space is used for. Additionally the alternative implementation of malloc can be written in C could be compiled using the compiler that is married to its own malloc implementation. If the compiler itself put hard limits on this then writing a replacement malloc would be severely hindered. – nategoose Feb 11 '21 at 21:21
  • @nategoose: And that requires, as I said, that the implementation adopt specifications beyond the C standard. – Eric Postpischil Feb 11 '21 at 21:22
  • @nategoose: For example, compile `char *p = malloc(1); memset(p, 42, malloc_usable_size(p));` with `-D_FORTIFY_SOURCE` and the `memset` will trap. This happens because the compiler's `__builtin_object_size` "knows" that the pointed-to object has size 1. – R.. GitHub STOP HELPING ICE Feb 12 '21 at 02:09