It depends on what you malloc()
. If you use malloc()
for large chunks of data, this should not make real differences. But if you malloc()
elements smaller that 64 bytes, you will probably not use the cache efficiently.
malloc()
allocates elements in memory in the program order. If several malloc()
s are close, elements will be in successive memory addresses and it is likely that they will be used together as they have been created at the same time. This is the so called spatial locality principle. Of course nothing is guaranteed, especially with dynamically allocated data, but spatial locality is observed in most programs. The practical implication of this principle is that it allows a better use of caches. A cache miss is expensive (you have to fetch 64 bytes from memory), but if you use elements close in memory, you have to pay it it only once.
So, if separately allocated data are in the same cache line, fetching one of these elements will bring you for free other elements close in memory. But if each element is occupies a complete cache line as it is with your modified allocator, it is no longer true. Every access to a data will be a cache miss, the number of data that your cache can hold will reduced and you will have the impression that cache size is reduced. The gross result will be an increase in your computation time.