3

In 32-bit machine, One memory read cycle gets 4 bytes of data.
So for reading below buffer, It should take 32 read-cycle to read a buffer of 128 bytes mentioned below.

char buffer[128];

Now, Suppose if I have aligned this buffer as mentioned below then please let me know how will it make it faster to read?

alignas(128) char buffer[128];

I am assuming the memory read cycle will remain 4 bytes only.

gaurav bharadwaj
  • 1,669
  • 1
  • 12
  • 29
Gaurav
  • 161
  • 1
  • 11

2 Answers2

5

The size of the registers used for memory access is only one part of the story, the other part is the size of the cache-line.

If a cache-line is 64 bytes and your char[128] is naturally aligned, the CPU generally needs to manipulate three different cache-lines. With alignas(64) or alignas(128), only two cache-lines need to be touched.

If you are working with memory mapped file, or under swapping conditions, the next level of alignment kicks in: the size of a memory page. This would call for 4096 or 8192 byte alignments.

However, I seriously doubt that alignas() has any significant positive effect if the specified alignment is larger than the natural alignment that the compiler uses anyway: It significantly increases memory consumption, which may be enough to trigger more cache-lines/memory pages being touched in the first place. It's only the small misalignments that need to be avoided because they may trigger huge slowdowns on some CPUs, or might be downright illegal/impossible on others.

Thus, truth is only in measurement: If you need all the speedup you can get, try it, measure the runtime difference, and see whether it works out.

cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106
1

In 32 bit machine, One memory read cycle gets 4 bytes of data.

It's not that simple. Just the term "32 bit machine" is already too broad and can mean many things. 32b registers (GP registers? ALU registers? Address registers?)? 32b address bus? 32b data bus? 32b instruction word size?

And "memory read" by whom. CPU? Cache? DMA chip?

If you have a HW platform where memory is read by 4 bytes (aligned by 4) in single cycle and without any cache, then alignas(128) will do no difference (than alignas(4)).

Ped7g
  • 16,236
  • 3
  • 26
  • 63