First, I've had only found one question about std::hardware_destructive_interference_size
and std::hardware_constructive_interference_size
here and this doesn't answer the following question: why are there two distinct values for this. Both should be the same as the cacheline-size. So what cache-architecture could mandate that there are two distinct values?

- 19,528
- 6
- 28
- 56

- 2,817
- 9
- 22
-
3"*Both should be the same as the cacheline-size.*" Should they? Can you explain why they should be the same? In a way that *doesn't* use specific implementation details? – Nicol Bolas Dec 18 '19 at 16:03
-
1The size that says what's the maximum size of a data-structue that shares only a single cacheline should be the same as the minimum difference between two data-structure to prevent false sharing. – Bonita Montero Dec 18 '19 at 16:08
-
That's the article I mentioned. But it doesn't tell why these sizes are different. – Bonita Montero Dec 18 '19 at 17:12
-
As I've understood it the sizes _doesn't need_ to be different. The separate definitions just cover the case that they _could_ be different i.e. to cover exotic H/W as well. (I must admit I don't try too hard to think about H/W. I'm too busy to get my S/W running and hopefully free of U.B. and, maybe, even with performance) ;-) – Scheff's Cat Dec 18 '19 at 17:19
-
It depends on the maximum achievable alignment. If it is less than the L1 cache line size then the compiler can't ensure that a variable is stored at the start of a cache line. – Hans Passant Dec 18 '19 at 17:34
-
_"Both should be the same as the cacheline-size."_ Does not need to be. Some architectures apply prefetching where two consecutive cache lines are involved. See those comments by Peter Cordes: https://stackoverflow.com/questions/39680206/understanding-stdhardware-destructive-interference-size-and-stdhardware-cons/39887282#comment127425357_39887282. – Daniel Langr May 22 '23 at 08:56
1 Answers
At least two types of cache designs can have different minimum alignment for avoiding false sharing and maximum alignment for true sharing: sectored cache blocks and variably aligned cache blocks.
A sector cache block that fetches the entire block (IBM-speak; sector in Intel-speak; unit of tag coverage) on a miss would have the block (sector) size for std::hardware_constructive_interference_size. Since smaller sectors (IBM-speak; line in Intel-speak; unit of validity) would be invalidated by remote (or different level cache) writes, std::hardware_destructive_interference_size would be the size of this smaller chunk. This is an design that has been implemented.
(It is not clear if a system that typically prefetches the adjacent block would have std::hardware_constructive_interference_size as twice the cache block/line size while having the cache block/line size for std::hardware_destructive_interference_size.)
Variably aligned cache blocks* (a design targeting larger cache blocks with slightly less cache block internal fragmentation wasted capacity) align storage at a smaller value than cache block size. E.g., a 64B cache block could be aligned at an even or odd 32B alignment; std::hardware_constructive_interference_size would be 32B (since an odd-32B aligned cache block would not fetch the complementary half of a 64B aligned chunk) but std::hardware_destructive_interference_size would be 128B (since an odd-32B aligned cache block would interfere with two 64B-aligned addresses). Variably aligned cache blocks also breaks the concept of alignment being sufficient for managing this aspect of cache performance.
Another possibility that would break these definitions would be a strided cache (a limited form of data trace cache). A cache that supported blocks with 2-word stride (i.e., one block storing words 0, 2, 4, etc. but not words 1, 3, 5, etc.) would significantly mess with the assumption behind std::hardware_constructive_interference_size and std::hardware_destructive_interference_size. While such cache blocks would typically be allocated for strided vector caching, the design violates the expectation of orthogonality and could cause performance problems when non-strided accesses are introduced later.
- The proposal for variable alignment mapped an alignment to a way and used overlaid skewed associativity to avoid capacity waste when any alignment was more common than another.
-
That's wrong. With sectored caches the unit of coherence between the cores is a sector. The maximum size for true sharing should be a sector and the minimum distance between objects to avoid false sharing should be a sector also. So both should be equally sized as well. – Bonita Montero Dec 21 '19 at 19:30
-
@BonitaMontero True sharing is the fetch width, which is the cache block (IBM)/cache sector (Intel) when the entire block/sector is fetched (which is typically the case for such caches, at least from main memory — reads satisfied from another cache might leave other sectors/lines invalid). Coherence (invalidation) is at the granularity of sector/line, so false sharing is at that granularity. std::hardware_constructive_interference_size is more concerned with how much memory will be brought in on an access, particularly from main memory. – Dec 21 '19 at 19:34
-
No, true sharing means that there aren't any avoidable collisions with other caching-units on other cores. It is not necessarily bound to the fetch-width. With sectored caches different cores can share different sectors of the same cacheline. So this isn't related to the cacheline as a whole. – Bonita Montero Dec 21 '19 at 21:00