5

How to get the memory granularity of a CPU in C?

Suppose I want to allocate an array where all the elements are properly memory aligned. I can pad each element to a certain size N to achieve this. How do I know the value of N?

Note: I am trying to create a memory pool where each slot is memory aligned. Any suggestion will be appreciated.

alinsoar
  • 15,386
  • 4
  • 57
  • 74
  • 3
    The compiler and memory manager (malloc etc) will do that for you. – Paul Ogilvie Jun 07 '20 at 13:15
  • 2
    The interesting question conflicts with one of the basis of C: the portability. The code is supposed agnostic of the processor architecture, and all port specific adjustments are done in compiler port dependent code, as @PaulOgilvie also said. Anyway it could be a real requirement for machine specific code in some cases, as in extreme efficiency achievement issue or board development. Unfortunately C language doesn't have any standard support for this. You must use the old conditional compiling using `#if/#else/#endif`. – Frankie_C Jun 07 '20 at 13:36
  • Your question is unclear. There is *word* alignment (for single objects), there is *page* alignment (for MMU/VM), and there is *cache* alignment (multiple levels). All of these can be present or absent, depending on the architecture. – wildplasser Jun 07 '20 at 14:13

3 Answers3

2

In Theory

How to get the memory granularity of a CPU in C?

First, you read the instruction set architecture manual. It may specify that certain instructions require certain alignments, or even that the addressing forms in certain instructions cannot represent non-aligned addresses. It may specify other properties regarding alignment.

Second, you read the processor manual. It may specify performance characteristics (such as that unaligned loads or stores are supported but may be slower or use more resources than aligned loads or stores) and may specify various options allowed by the instructions set architecture.

Third, you read the operating system documentation. Some architectures allow the operating system to select features related to alignment, such as whether unaligned loads and stores are made to fail or are supported albeit with slower performance than aligned loads or stores. The operating system documentation should have this information.

In Practice

For many programming situations, what you need to know is not the “memory granularity” of a CPU but the alignment requirements of the C implementation you are using (or of whatever language you are using). And, for the most part, you do not need to know the alignment requirements directly but just need to follow the language rules about managing objects—use objects with declared types, do not use casts to convert pointers between incompatible types exceed where specific rules allow it, use the suitably aligned memory as provided by malloc rather than adjusting your own pointers to bytes, and so on. Following these rules will give good alignment for the objects in your program.

In C, when you define an array, the element size will automatically be the size that C implementation needs for its alignment. For example, long double x[100]; may use 16 bytes for each array element even though the hardware uses only ten bytes for a long double. Or, for any struct foo that you define, the compiler will automatically include padding as needed in the structure to give the desired alignment, and any array struct foo x[100]; will already include that padding. sizeof(struct foo) will be the same as sizeof x[0], because each structure object has that padding built in, even just for a single structure object, not just for elements in arrays.

When you do need to know the alignment that a C implementation requires for a type, you can use C’s _Alignof operator. The expression _Alignof(type) provides the alignment required for type.

Other

… properly memory aligned.

Proper alignment is a matter of degrees:

  • What the processor supports may determine whether your program works or does not work. An improper alignment is one that causes your program to trap.
  • What is efficient with respect to individual loads and stores may affect how fast your program runs. An improper alignment is one that causes your program to execute more slowly.
  • In certain performance-critical situations, alignment with respect to cache and memory mapping features can also affect performance.
Community
  • 1
  • 1
Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • The elements of a structure are aligned automatically but not the array elements. Consider a struct foo of size 12 and array of foo A[10]. A[0] is aligned but A[1] does not. Address of A[1] starts at A[0] + 12. – Prabhakar Tayenjam Jun 07 '20 at 15:28
  • @PrabhakarTayenjam: Structures include padding at the end to provide the alignment required for arrays of structures. If the size of a structure is 12 bytes in a C implementation, that is because the structure and the members inside it require at most four-byte alignment in the C implementation, and every element in an array of that structure will satisfy that requirement when they are spaced 12 bytes apart. If some member of the structure requires more than eight-byte or 16-byte alignment, then the size of the structure will not be 12 bytes; the compiler will add padding to make it 16 bytes. – Eric Postpischil Jun 07 '20 at 16:05
1

Short answer

Use 64 bytes.

Long answer

Data are loaded from and stored to memory in units called cache lines. If your program loads only part of the data in a cache line, then the whole line will be loaded into the CPU caches. Perhaps more importantly, the algorithm used for moving data between cores in a multi-core CPU operates on full cache lines; aligning your data to cache lines avoids false sharing, the situation where a cache line bounces between cores because it contains data manipulated by different threads.

It used to be the case that cache lines depended on the architecture, ranging from 16 up to 512 bytes. However, all current processors (Intel, AMD, ARM, MIPS) use a cache line of 64 bytes.

jch
  • 5,382
  • 22
  • 41
  • Another slightly unrelated question: If a data doesn't fit in one cache line and requires two, the cpu has to read twice. So it doesn't matter if the data is aligned or not. It will not affect performance (from a performance point of view). Is this the case? – Prabhakar Tayenjam Jun 07 '20 at 15:22
  • Not quite, you pay a small penalty for accessing unaligned data once it is in the cache. – jch Jun 08 '20 at 01:11
0

This depends heavily on the cpu microarchitecture that you are using.

In many cases, the memory address of an operator should be a multiple of the operand size, otherwise execution will be slow (or even might throw an exception).

But there are also CPUs which do not care about a specific alignment of the operands in memory at all.

Usually, the C compiler will care about those details for you. You should, however, make sure that the compiler assumes the correct target (micro-)architecture, for example by specifying it with the correct compiler flags (-march=? on gcc).

Ctx
  • 18,090
  • 24
  • 36
  • 51