I just know basic ideas on aligned memory allocation. But I didn't cared much about align issue because I am not an assembly programmer, also didn't have experience with MMX/SIMD. And I think this is the one of the the premature optimizations.
These days people saying more and more about cache hit, cache coherent, optimization for size, etc. Some source code even allocate memory explicitly aligned on CPU cache lines.
Frankly, I don't know how much is the cache line size of my i7 CPU. I know there will be no harm with large size align. But will it really pay off, without SIMD ?
Let's say there 100000 items of 100 bytes data in a program. And access to these data is the most intensive work of the program.
If we change the data structure and make all the 100 bytes size data aligned by 16 byte, is it possible to gain noticeable performance gain ? 10%? 5%?