I've been working on the code of CLISP a bit. Something similar is done there to squeeze some immediate values into a pointer (instead of allocating an object from the garbage collected heap).
In the case of CLISP this works because we know the range of addresses that could be returned by our allocator, though. Then there's a bit that could never be set by an valid address, which if set indicates that the "pointer" is not an actual pointer but data (the single - or more - character in your case).
Btw: Using a char *
is not a good plan, you're halfway to shooting yourself in the foot due to undefined behavior or just accidentally passing such a "pointer" to for example strlen
. I guess using a union
is a much better approach (Though I don't currently have time to check the actual rules from the standard about whether using an union in this way is even allowed). Even more safe is packing an uintptr_t
inside a struct.
[..] but if someone here have some knowledge on the deep-end of malloc it might save me a lot of time.
That's not going to buy you anything. Switch operating system, platform or just the used standard library and everything could be different from what you learnt about the malloc implementation that you're currently looking at.
One approach is to use your own allocator - like done with CLISP - which is taking memory from some pool (for example acquired with mmap
on Linux/BSD) which resides at some predefined address.
But you can also use malloc
(or a better suitable function, like C11 aligned_alloc
or posix_memalign
) to allocate aligned memory for your strings. Let's say you align every string at an even address. That way when you see an address which has the least significant bit set, then you can be sure that it's not actually an address but rather immediate data, i.e. the string itself.
Using malloc
only to allocate 2 byte aligned memory works so: You allocate 2 additional bytes. If the address returned by malloc
is already correctly aligned, then you return the next correctly aligned address (char_ptr + 2
) and mark the cell directly before that address with some value which indicates that the original address was aligned (char_ptr[1] = '1'
). If, on the other hand, the returned address is not correctly aligned, then you return the directly following byte (which is correctly aligned; char_ptr + 1
) and mark the cell directly before that address (thus, the first; char_ptr[0] = '0'
).
When freeing, look at the cell directly before the address that was passed in, it contains the mark to tell you which address you need to free
.
In code:
#define IS_ALIGNED_2BYTE(value) (((uintptr_t)(value) & 0x01) == 0)
/// Allocate a memory region of the specified size at an even (2 byte
/// aligned) address.
///
/// \param[in] size Required size of the memory region.
/// \return Pointer to the memory, or NULL on failure.
inline static void * allocate_2byte_aligned(size_t size) {
#ifdef HAVE_ALIGNED_ALLOC
return aligned_alloc(2, size);
#elif defined(HAVE_POSIX_MEMALIGN)
void * ptr;
if (posix_memalign(&ptr, sizeof(void *), size) == 0) {
assert(IS_ALIGNED_2BYTE(ptr)); // Paranoia due to uncertainty
// about alignment parameter to
// posix_memalign.
return ptr;
} else {
return NULL;
}
#else
char * const memory = malloc(size + 2);
if (! memory) {
return NULL;
}
if (IS_ALIGNED_2BYTE(memory)) {
// memory is correctly aligned, but to distinguish from originally
// not aligned addresses when freeing we need to have at least one
// byte. Thus we return the next correctly aligned address and
// leave a note in the byte directly preceeding that address.
memory[1] = '1';
return &(memory[2]);
} else {
// memory is not correctly aligned. Leave a note in the first byte
// about this for freeing later and return the next (and correctly
// aligned) address.
memory[0] = '0';
return &(memory[1]);
}
#endif
}
/// Free memory previously allocated with allocate_2byte_aligned.
///
/// \param[in] ptr Pointer to the 2 byte aligned memory region.
inline static void free_2byte_aligned(void * ptr) {
assert(IS_ALIGNED_2BYTE(ptr));
#if defined(HAVE_ALIGNED_ALLOC) || defined(HAVE_POSIX_MEMALIGN)
free(ptr);
#else
char const * const memory = ptr;
void const * original_address;
if (memory[-1] == '0') {
// malloc returned an address that was not aligned when allocating
// this memory block. Thus we left one byte unused and returned
// the address of memory[1]. Now we need to undo this addition.
original_address = &(memory[-1]);
} else {
// malloc returned an address that was aligned. We left two bytes
// unused and need to undo that now.
assert(memory[-1] == '1');
original_address = &(memory[-2]);
}
free((void *) original_address);
#endif
}
Creating and destroying "pointer-or-immediate-data" structures is then straightforward:
typedef struct its_structure {
uintptr_t data; ///< Either a pointer to the C string, or the actual
///< string, together with a bit to indicate which
///< of those it is.
} its;
its its_alloc(size_t size) {
if (size < sizeof(uintptr_t)) {
its const immediate_string = {.data = 0x01};
return immediate_string;
} else {
void * const memory = allocate_2byte_aligned(size);
assert(IS_ALIGNED_2BYTE(memory));
its const allocated_string = {
.data = memory ? (uintptr_t) memory : (0x01 | 0x02) /* Invalid string */};
return allocated_string;
}
}
void its_free(its string) {
if (IS_ALIGNED_2BYTE(string.data)) {
free_2byte_aligned((void *) string.data);
} // else immediate, thus no action neccessary
}
Above code is actually from a small library I wrote to test/write this answer. If you want then use / enhance it.