How to programmatically get the CPU cache line size in C++?

Question

I'd like my program to read the cache line size of the CPU it's running on in C++.

I know that this can't be done portably, so I will need a solution for Linux and another for Windows (Solutions for other systems could be useful to others, so post them if you know them).

For Linux I could read the content of /proc/cpuinfo and parse the line beginning with cache_alignment. Maybe there is a better way involving a call to an API.

For Windows I simply have no idea.

score 21 · Accepted Answer · edited May 16 '09 at 16:39

21

On Win32, GetLogicalProcessorInformation will give you back a SYSTEM_LOGICAL_PROCESSOR_INFORMATION which contains a CACHE_DESCRIPTOR, which has the information you need.

edited May 16 '09 at 16:39

Roger Lipscombe

89,048
55
235
380

answered Sep 29 '08 at 19:38

Nick

13,238
17
64
100

1

Yikes - decoding the array of SYSTEM_LOGICAL_PROCESSOR_INFORMATION structures looks like it would be a pain. – Michael Burr Sep 29 '08 at 19:48
6

Welcome to the world of systems programming. ;) – Mr. Shickadance May 16 '09 at 16:42
It's not too bad, Michael. Anyways, getting to grips with it forces you to learn the how CPU topology is arranged, and you may well need to know. – Mar 20 '11 at 23:09
1

Woot? No code snippet I can simply copy and paste?!! *cries* – BitTickler Feb 19 '15 at 00:08

score 7 · Answer 2 · answered Sep 29 '08 at 19:49

7

On Linux try the proccpuinfo library, an architecture independent C API for reading /proc/cpuinfo

answered Sep 29 '08 at 19:49

PiedPiper

5,735
1
30
40

Evan Teran · Answer 3 · 2023-02-08T19:08:49.873

5

Looks like at least SCO unix (http://uw714doc.sco.com/en/man/html.3C/sysconf.3C.html) has _SC_CACHE_LINE for sysconf. Perhaps other platforms have something similar?

edited Feb 08 '23 at 19:08

answered Sep 29 '08 at 19:38

Evan Teran

87,561
32
179
238

robottobor · Answer 4 · 2008-09-29T21:31:33.843

5

For x86, the CPUID instruction. A quick google search reveals some libraries for win32 and c++. I have used CPUID via inline assembler as well.

Some more info:

edited Sep 29 '08 at 21:31

answered Sep 29 '08 at 19:46

robottobor

11,595
11
39
37

1

could you comment on how you'd use CPUID to get this? – Nathan Fellman May 16 '09 at 17:10

score 4 · Answer 5 · answered Sep 13 '16 at 09:25

4

On Windows

#include <Windows.h>
#include <iostream>

using std::cout; using std::endl;

int main()
{
    SYSTEM_INFO systemInfo;
    GetSystemInfo(&systemInfo);
    cout << "Page Size Is: " << systemInfo.dwPageSize;
    getchar();
}

On Linux

http://linux.die.net/man/2/getpagesize

answered Sep 13 '16 at 09:25

Researcher

1,006
7
14

4

After coming back to this I don't believe I answered your question, which was about the cache line size rather then the memory page size correct? https://en.wikipedia.org/wiki/Page_(computer_memory) I was googling for a page size snippet (working on a project involving memory access) and came here, the dangers of skimming. Please untick my answer, but probably worth leaving it here for future reference. – Researcher Sep 15 '16 at 17:33
Indeed, the question was mistitled with "cache page size". I fixed it. – Peter Cordes Feb 08 '23 at 19:11

metablaster · Answer 6 · 2023-02-16T11:08:23.977

Here is sample code for those who wonder how to to utilize the function in accepted answer:

#include <new>
#include <iostream>
#include <Windows.h>


void ShowCacheSize()
{
    using CPUInfo = SYSTEM_LOGICAL_PROCESSOR_INFORMATION;
    DWORD len = 0;
    CPUInfo* buffer = nullptr;

    // Determine required length of a buffer
    if ((GetLogicalProcessorInformation(buffer, &len) == FALSE) && (GetLastError() == ERROR_INSUFFICIENT_BUFFER))
    {
        // Allocate buffer of required size
        buffer = new (std::nothrow) CPUInfo[len]{ };

        if (buffer == nullptr)
        {
            std::cout << "Buffer allocation of " << len << " bytes failed" << std::endl;
        }
        else if (GetLogicalProcessorInformation(buffer, &len) != FALSE)
        {
            const DWORD count = len / sizeof(CPUInfo);
            for (DWORD i = 0; i < count; ++i)
            {
                // This will be true for multiple returned caches, we need just one
                if (buffer[i].Relationship == RelationCache)
                {
                    std::cout << "Cache line size is: " << buffer[i].Cache.LineSize << " bytes" << std::endl;
                    break;
                }
            }
        }
        else
        {
            std::cout << "ERROR: " << GetLastError() << std::endl;
        }

        delete[] buffer;
    }
}

If `len` is in bytes, shouldn't it be divided by `sizeof(CPUInfo)` before running through the buffer entries? — GuillemVS, Jan 15 '22 at 21:35
@GuillemVS thank you for spotting this, you're correct I've updated my sample code. — metablaster, Feb 16 '23 at 11:09

score 0 · Answer 7 · answered Sep 29 '08 at 19:45

0

I think you need NtQuerySystemInformation from ntdll.dll.

answered Sep 29 '08 at 19:45

rami

1,586
12
13

score 0 · Answer 8 · answered Feb 08 '23 at 19:38

If supported by your implementation, C++17 std::hardware_destructive_interference_size would give you an upper bound (and ..._constructive_... a lower bound), taking into account stuff like hardware prefetch of pairs of lines.

But those are compile-time constants, so can't be correct on all microarchitectures for ISAs which allow different line sizes. (e.g. older x86 CPUs like Pentium III had 32-byte lines, but all later x86 CPUs have used 64-byte lines, including all x86-64. It's theoretically possible that some future microarchitecture will use 128-byte lines, but multi-threaded binaries tuned for 64-byte lines are widespread so that's perhaps unlikely for x86.)

For this reason, some current implementations choose not to implement that C++ feature at all. GCC does implement it, clang doesn't (Godbolt). It becomes part of the ABI when code uses it in struct layouts, so it's not something compilers can change in future to match future CPUs for the same target.

GCC defines both constructive and destructive as 64 x86-64, neglecting the destructive interference that adjacent-line prefetch can cause, e.g. on Intel Sandybridge-family. It's not nearly as disastrous as false sharing within a cache line in a high-contention case, so you might choose to only use 64-byte alignment to separate objects that different threads will be accessing independently.

Should the cache padding size of x86-64 be 128 bytes? - a performance experiment on Skylake showing 500 +- 300 machine clears in an aligned pair of lines, vs. 10M in a single line, vs. near zero in more distant lines. Machine clears were easier to measure than actual cache misses due to losing access to the line.
Understanding std::hardware_destructive_interference_size and std::hardware_constructive_interference_size

How to programmatically get the CPU cache line size in C++?

8 Answers8

Linked