12

When I do a writemasked AVX-512 store, like so:

vmovdqu8 [rsi] {k1}, zmm0

Will the instruction fault if some portion of the memory accessed at [rsi, rsi + 63] is not mapped but the writemask is zero for all those locations (i.e., the data is not actually modified due to the mask).

Another way of asking it is if these AVX-512 masked stores have a similar fault suppression ability to vmaskmov introduced in AVX.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • 2
    Yes, looking up the vol.2 manual reference now. 2.7 EXCEPTION CLASSIFICATIONS OF EVEX-ENCODED INSTRUCTIONS. It doesn't specifically distinguish stores from loads, but I think they'd say if stores *didn't* suppress faults the way `vmaskmovps` does. They do use the term "memory fault suppression". (And also FPU fault suppression). They do specifically list NT stores as *not* suppressing faults (I guess even with all the mask bits clear). – Peter Cordes Feb 02 '19 at 22:57
  • I'm 95% sure that masked out elements will not fault. I've seen the Intel compiler peel loops this way and I've done it myself many times and never encountered any problems. But I've admittedly never actually tested this myself with mmap and such. OTOH, I did read from somewhere (I forgot where) that an invalid access for masked out elements will still come with the performance penalties of a cache miss + TLB miss. – Mysticial Feb 04 '19 at 21:04
  • Slides like this are what led me to just believe instead of actually testing it: https://images.anandtech.com/doci/11550/basin_falls_june_6-page-011.jpg – Mysticial Feb 04 '19 at 21:07
  • @Mysticial - yeah I came across a similar slide in my search (maybe it was the same fact though). Kind of weird we can't find anything totally definitive in the manual although the stuff Peter found comes pretty close. – BeeOnRope Feb 05 '19 at 03:13
  • There is some mention of an assist being required for some operations such as stores to negative locations, and looks like it has something akin to a trap to microcode. – jasonk Apr 11 '23 at 04:31

1 Answers1

16

No fault is raised if masked out elements touch invalid memory.

enter image description here


Here's some Windows test code to prove that masking does indeed suppress memory faults.

#include <immintrin.h>
#include <iostream>
#include <Windows.h>
using namespace std; 


int main(){
    const size_t PAGE = 4096;

    //  Map 2 pages.
    char* ptr = (char*)VirtualAlloc(
        nullptr, 2*PAGE,
        MEM_COMMIT,
        PAGE_READWRITE
    );

    //  Store 64 bytes across page boundary.
    cout << "Store across page boundary." << endl;
    _mm512_storeu_si512(ptr + PAGE - 32, _mm512_set1_epi8(-1));

    //  Unmap top page.
    cout << "Unmap top page." << endl;
    VirtualFree(ptr + PAGE, PAGE, MEM_DECOMMIT);

    //  Write on boundary masking out the part that touches the top (unmapped page).
    //  Does not crash because bad accesses are masked out.
    cout << "Store across page boundary, but mask out bytes that are on unmapped page." << endl;
    _mm512_mask_storeu_epi8(ptr + PAGE - 32, 0x00000000ffffffff, _mm512_set1_epi8(-1));

    //  Store 64 bytes across page boundary.
    //  Crashes because of bad access.
    cout << "Store across page boundary." << endl;
    _mm512_storeu_si512(ptr + PAGE - 32, _mm512_set1_epi8(-1));

    cout << "Release bottom page." << endl;
    VirtualFree(ptr, 0, MEM_RELEASE);

    system("pause");
}

Output:

Store across page boundary.
Unmap top page.
Store across page boundary, but mask out bytes that are on unmapped page.
Store across page boundary.
**Access violation**

This test works as follows:

  1. Map 2 adjacent pages.
  2. Do an AVX512 store across the page boundary to prove that both pages are mapped.
  3. Unmap the upper page.
  4. Do the same AVX512 store, but mask out the bytes that are on the upper page. It does not crash.
  5. Repeat the 1st AVX512 store (without masking). It crashes, thus proving that the upper page has been unmapped and the masking suppressed the crash.
Mysticial
  • 464,885
  • 45
  • 335
  • 332
  • 1
    Another slide from the same presentation (https://gcc.gnu.org/wiki/cauldron2014?action=AttachFile&do=get&target=Cauldron14_AVX-512_Vector_ISA_Kirill_Yukhin_20140711.pdf) mentions that shuffles with a memory source operand *don't* do fault-suppression (because masking is per dst position, not src). But those slides didn't explicitly mention stores. Anyway, thanks for testing this to confirm what we all thought. – Peter Cordes Feb 05 '19 at 08:40
  • Also worth mentioning to look at the assembly before believing this test since there's plenty of room for the compiler to defeat it. While this wasn't the case for both the MSVC and Intel compilers, it is possible for a compiler to optimize out some of these dead stores or do a strength reduction on the masked store (since the upper 32-bytes are all inactive). – Mysticial Feb 05 '19 at 16:44