4

I am developing a code generator for the AVX2 instructions and attempting to test it on an AMD A10 8700 procesor which, by the specs, should run AVX2.

However, it crashes with an illegal instruction on the vgatherdps instruction in:

vmovdqu     yMM0,  [ r9+  PmainBase +  -256]; LDdqyy;0
lea         r9,    [      PmainBase +  -192];0
vpcmpeqw    ymm8, ymm8, ymm8;0
vgatherdps  YMM0,  [ r9+ yMM0*4 ] ,ymm8;0

The cpuid info says that avx2 is included, so VGATHERDPS should be legal.

Any clues?

Mysticial
  • 464,885
  • 45
  • 335
  • 332
  • 2
    It did execute some AVX2 instructions then, so it's enabled. What's the machine code of that gather? – harold Sep 19 '16 at 10:58
  • Why do you have `;0` comments at the end of every line? And inconsistent capitalization like `yMM0`? Basically your code is a mess from a formatting / readability standpoint. (not relevant to how it assembles, though). – Peter Cordes Sep 19 '16 at 15:25

2 Answers2

3

According the Intel's Instruction Reference:

If any pair of the index, mask, or destination registers are the same, this instruction results a UD fault.

Your destination and index registers are the same (ymm0). Therefore the instruction is illegal. I'm surprised that the assembler didn't reject it. So I'd consider that a bug in the assembler.


The reasoning behind this restriction is that the gather instruction is actually writing to two registers.

  1. It writes the result of the gather to the destination.
  2. It writes a mask of zeros back to the mask register indicating which lanes succeeded.

Under normal execution, the mask register will be completely overwritten with zeros. But if a fault occurs, not all the loads will have been successful. So it's possible for the instruction to only be partially executed. The purpose of overwriting the mask register is to tell the signal handler which lanes succeeded and which lanes failed.

The index register cannot alias with either destination or mask registers because it would be overwritten thereby making it impossible to resume the instruction upon returning from the signal handler.

Mysticial
  • 464,885
  • 45
  • 335
  • 332
  • OP is writing a code generator, so he may not have run his code through an assembler. Moreover, there's a very good reason why the `idx` reg cannot match either the `dst` or `msk` reg. Look at the pseudocode of `vgatherdps` VEX.256: _**First**, the MSB of every lane in the `msk` register is splatted to all bits of the lane_, and then for those lanes that are enabled, the hardware attempts to load the data and clears the mask in that lane if successful. Because `msk` is unconditionally and wholly overwritten, it must not alias either `dst` or `idx` because otherwise you'd have a write conflict. – Iwillnotexist Idonotexist Sep 20 '16 at 04:54
  • And you can't resolve this write conflict either way: Lose `msk` and you can't tell which lane failed, lose `idx` and you can't retry, lose `dst` and what was the point of the instruction then? And even if `msk` were partially updated with `dst` values, you have an ambiguity due to the in-band communication problem; It's conceivable that you successfully read `0xFFFFFFFF`, but if this value is within the mask register, how can you tell whether the read was successful with `0xFFFFFFFF`, or failed? Moreover, `idx` can't alias `msk` because negative indices are legitimate. – Iwillnotexist Idonotexist Sep 20 '16 at 05:02
  • And lastly, `idx` can't alias `msk` because in the family of `vgather*` instructions, `idx` can be qword or dword, while `msk` and `dst` lane size always match, and if `dst` is dword while `idx` is qword then only the lower half of the register is filled while the upper half is zeroed. – Iwillnotexist Idonotexist Sep 20 '16 at 05:11
  • 1
    The 'mess' is because it comes from a code generator that encodes debugging information in the cases and in some of the comments that follow each instruction. Code is for Nasm which did not object to the syntax. ymm8 is the mask and is not the same as the dest for obvious reasons. I had not realised that on AVX2 index and dest could not overlap, that did not seem to be the case on the MIC from which the code generator is being adapted. – Paul Cockshott Sep 20 '16 at 09:48
  • @IwillnotexistIdonotexist Oh. Initially, I thought the mask is only conditionally zeroed. My interpretation was that successful lanes are zeroed, failed lanes are untouched. If the index were to alias with the mask, failed lanes are unaffected. Successful lanes are zeroed. You won't be able to tell if zeroed lane was successful or zero to begin with, but it would be a no-op in either case. But you're right, I failed to consider the case where the index register is a different size. I'll edit fix the latter paragraph. – Mysticial Sep 20 '16 at 14:38
  • @PaulCockshott For AVX512 on KNL, the same restriction applies. You can't overlap the destination with the index register. The mask is a separate register so that can't overlap with anything anyway. I'm unsure of the situation with KNC MIC. – Mysticial Sep 20 '16 at 14:55
1

Fixed machine description to use disjoint sets of registers for indices and destinations in gather instructions and get

vmovdqu   yMM4,  [   PmainBase +          -256]; LDdqyy;0
lea r8,[   PmainBase +          -192];0
vpcmpeqw ymm8, ymm8, ymm8;0
vgatherdps  YMM0,[r8+ yMM4*4 ] ,ymm8;0

which now works fine.

Box Box Box Box
  • 5,094
  • 10
  • 49
  • 67