8

Is there a C++11 standards compliant (or if not compliant, at least generally acceptable) way to determine if an address is aligned with a cache line boundary?

E.g. something like this:

T* p = SOMETHING;
bool aligned = reinterpret_cast< std::uintptr_t > (p) % CACHE_LINE_SIZE == 0;
atb
  • 1,412
  • 1
  • 14
  • 30
  • 5
    **5.2.10/4** *A pointer can be explicitly converted to any integral type large enough to hold it. The mapping function is implementation-defined. [ Note: It is intended to be unsurprising to those who know the addressing structure of the underlying machine. —end note ]* I would imagine that, for those architectures where the notion of "cache line size" is meaningful, `size_t(p)` would behave as you expect it to; your code should work as written. – Igor Tandetnik Jun 04 '14 at 22:37
  • 6
    But, just to be on the right size, one should use [`std::uintptr_t`](http://en.cppreference.com/w/cpp/types/integer)... :D – Massa Jun 04 '14 at 22:41
  • edited question based on Massa's comment... – atb Jun 04 '14 at 23:31
  • possible duplicate of [Understanding how the CPU decides what gets loaded into cache memory](http://stackoverflow.com/questions/18568768/understanding-how-the-cpu-decides-what-gets-loaded-into-cache-memory) -- despite the different-looking title one of the issues it addresses is determining native cache-line size – jthill Jun 04 '14 at 23:49
  • 2
    @jthill: The issue is about the semantics of pointer to integer conversion in C++ (and also C) where the standards give much leeway; technically the result of a pointer to integer cast may be a value very different from the address the CPU sees. – datenwolf Jun 04 '14 at 23:53
  • @IgorTandetnik: With Massa's correction, your comment should be the answer. – Phil Miller Jun 05 '14 at 00:23
  • @jthill: I'm not really asking how to determine CACHE_LINE_SIZE, I'm assuming I already know that. – atb Jun 05 '14 at 01:04
  • 1
    Okay, thank you. Then I'd say the passage @IgorTandetnik quoted, *It is intended to be unsurprising to those who know the addressing structure of the underlying machine.* is intended to qualify testing for low-bits-zero as generally acceptable on machines with a flat address space and pointers that fit in a `uintmax_t`. – jthill Jun 05 '14 at 01:09
  • Thank you all. It is interesting that with all of C++11's alignment control functionality there doesn't seem to be much alignment introspection functionality...unless I'm missing something. – atb Jun 05 '14 at 01:31
  • @jhill: also, I don't see how the question you marked as a duplicate answers my question at all...can you explain? – atb Jun 05 '14 at 01:53
  • It was the second question he quoted, "How can the programmer check what size a cache line is for any given architecture?", and it doesn't answer your question, only what I thought was your question. Good point about the alignment introspection. I don't know, but I'll hazard a guess that worrying about cache architecture is so rarely valuable and such a temptation to premature optimization that leaving it out of the standard might have been considered an actual improvement. – jthill Jun 05 '14 at 03:40
  • @jthill, [tag:multithreading] experts would not agree with you. cache coherence protocol is the essence of [tag:shared-memory] and cache line size is its fundamental parameter. Ignoring things like false-sharing can put a x1000 penalty on your code. – Anton Jun 06 '14 at 16:24
  • @Anton Theoretical worst-case effects on one or two carefully-selected primitives can be dramatic, but some of those experts might also point out that (a) the actual, measured effect of relevant coherence issues on the _vast_ majority of programs, even multithreaded programs, is completely insignificant, (b) a large majority of that tiny minority that might actually have reason to care doesn't actually have to care, because those primitives are used only internally by Standard or 3rd-party libraries implementing the services those programs actually use. – jthill Jun 06 '14 at 18:03
  • @jthill, my question wasn't "Should I be worrying about cache line alignment in my code?" I am trying to scale a specific parallel algorithm to 64 cores, it is very important in this case. – atb Jun 06 '14 at 18:14
  • I'm apparently missing something obvious here, help me? What I'm seeing is, you know the required alignment, people have pointed out that testing the low bits is generally acceptable on all common architectures, including I gather yours, which leaves so far as I can see nothing but fairly irrelevant side issues. At any rate, there _is_ a `std::align` function, but [it's missing in gcc and apparently bugged in msvc](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57350) so while it passes the "standards compliant" test, it fails the "generally acceptable" one :-) – jthill Jun 06 '14 at 18:38
  • @jthill, I'm not trying to allocate data, I'm trying to test an address I already have. – atb Jun 06 '14 at 18:46

2 Answers2

3

There's a std::align function but it's apparently not much in demand: it's missing in gcc 4.9 and badly bugged in MSVC.

The proposed implementation of it (it's short enough to just read) is

inline void *align( std::size_t alignment, std::size_t size,
                    void *&ptr, std::size_t &space ) {
    std::uintptr_t pn = reinterpret_cast< std::uintptr_t >( ptr );
    std::uintptr_t aligned = ( pn + alignment - 1 ) & - alignment;
    std::size_t padding = aligned - pn;
    if ( space < size + padding ) return nullptr;
    space -= padding;
    return ptr = reinterpret_cast< void * >( aligned );
}

... a bit overkill here because to simply test for an already-aligned pointer it boils down to your method exactly (with bitbashing not %, but no matter). Its implementation is, as @IgorTandetnik points out, "unsurprising to those who know the addressing structure of the underlying machine"

Community
  • 1
  • 1
jthill
  • 55,082
  • 5
  • 77
  • 137
3

If you have a C++11-compliant compiler, then its documentation tells you.

As already mentioned by Igor in the comments, the rules for reinterpret_cast include:

A pointer can be explicitly converted to any integral type large enough to hold it. The mapping function is implementation-defined.

That term doesn't just mean "non-portable", it adds specific requirements, found in 1.3.10:

implementation-defied behavior

behavior, for a well-formed program construct and correct data, that depends on the implementation and that each implementation documents

If your compiler does not document whether a pointer converted to integer is actually a memory address, then it is not a C++ compiler.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720