My code needs to work with a large array of structures containing multiple strings.
in practice, the whole array will contain about 25k structures with a size of about 256 byte each, so the whole array needs about 6 MiB of heap space.
// example
struct element {
char foo[32];
char bar[16];
...
}; // sizeof(struct element) = 256
I was concerned about the performance of calloc
due to it zeroing all the memory, also I don't need every byte to be initialized. So I did element_arr = malloc(num_elements * sizeof(struct element))
.
I allocate the array at runtime as I don't know num_elements
at compile time.
For my code to work, I actually only need the first bytes of each member (foo
, bar
, etc.) to be zero, the rest can stay uninitialized.
Say I got 8 string members per struct, so I only need 3% of my zeroed bytes, the other 97% cleared bytes are waste as they will get overwritten by real data eventually).
I see a few options:
zero everything at once, e.g. with
calloc
which does (I hope) make use of vectored instructions to write large blocks of aligned zeroes.memset
each 256-byte sizedstruct element
before filling it with real data.assign 0 to each member of
struct element
before using it. (*element->foo = 0; ...
) This translates to a chain ofmov
instructions, with optimizations at-O3
. It is cumbersome to write language-wise (but can be taken care of).
mov byte ptr [rdi + 152], 0
mov byte ptr [rdi + 208], 0
mov byte ptr [rdi + 200], 0
mov byte ptr [rdi + 128], 0
...
looks similar for arm64.
- make a very conservative assumption about the size of
element_arr
(e.g. 64 MiB), place it in a zero-initialized section of memory. (The OS needs to zero my memory then)
char element_arr[64 * 1000 * 1000] = {0};
(checking num_elements < 250000
to be sure)
Does it make a difference what option to choose ? What would you suggest ?
Edit: @John Bayko The individual structures are filled incrementally, but all strings need to start with '\0' otherwise the algorithm can't distinguish between a real string (already got filled) or uninitialized garbage.
After reading the other answers I'm convinced that it probably won't be a problem anytime soon. It's good to know that the simplest solution (calloc
) is a good one in the majority of use cases.
I profiled my code on my dev machine and indeed, the time spent on allocation is neglectible.
Thanks for your replies.