Qemu invalid instruction trap on SSE instruction

Question

Working with NYU's fork of MIT's xv6 operating system, we found we would get crashes under GCC 11 & 12 due to default usage of SSE2 instructions under -O0.

Problem is I don't know why. Issue is first encountered during an entirely innocent struct copy here.

When compiled with -mno-sse under GCC 12.2 the result is:

  *np->tf = *proc->tf;
801047c3:   65 a1 04 00 00 00       mov    %gs:0x4,%eax
801047c9:   8b 48 18                mov    0x18(%eax),%ecx
801047cc:   8b 45 e0                mov    -0x20(%ebp),%eax
801047cf:   8b 40 18                mov    0x18(%eax),%eax
801047d2:   89 c2                   mov    %eax,%edx
801047d4:   89 cb                   mov    %ecx,%ebx
801047d6:   b8 13 00 00 00          mov    $0x13,%eax
801047db:   89 d7                   mov    %edx,%edi
801047dd:   89 de                   mov    %ebx,%esi
801047df:   89 c1                   mov    %eax,%ecx
801047e1:   f3 a5                   rep movsl %ds:(%esi),%es:(%edi)

And this works fine, when compiled without disabling SSE the result is:

  *np->tf = *proc->tf;
8010479f:   65 a1 04 00 00 00       mov    %gs:0x4,%eax
801047a5:   8b 50 18                mov    0x18(%eax),%edx
801047a8:   8b 45 f0                mov    -0x10(%ebp),%eax
801047ab:   8b 40 18                mov    0x18(%eax),%eax
801047ae:   f3 0f 6f 02             movdqu (%edx),%xmm0
801047b2:   0f 11 00                movups %xmm0,(%eax)
801047b5:   f3 0f 6f 42 10          movdqu 0x10(%edx),%xmm0
801047ba:   0f 11 40 10             movups %xmm0,0x10(%eax)
801047be:   f3 0f 6f 42 20          movdqu 0x20(%edx),%xmm0
801047c3:   0f 11 40 20             movups %xmm0,0x20(%eax)
801047c7:   f3 0f 6f 42 30          movdqu 0x30(%edx),%xmm0
801047cc:   0f 11 40 30             movups %xmm0,0x30(%eax)
801047d0:   f3 0f 6f 42 3c          movdqu 0x3c(%edx),%xmm0
801047d5:   0f 11 40 3c             movups %xmm0,0x3c(%eax)

And this traps on invalid opcode at the first SSE instruction, 801047ae:

unexpected trap 6 from cpu 0 eip 801047ae (cr2=0x0)

So uh, what gives? These are all unaligned access instructions, so alignment shouldn't be an issue. I've tested under both qemu-system-i386 and qemu-system-x86_64, same results. Tested with -machine accel=kvm -cpu max, same results.

Trap 6 is "invalid opcode". As per the manual, it is generated _"If CR0.EM[bit 2] = 1. If CR4.OSFXSR[bit 9] = 0."_ So verify you have those set up properly. Would also be dumped by qemu but you have unfortunately not shown the log. — Jester, Sep 01 '22 at 13:35
@Jester I should have made it clear I knew it was trapping on invalid opcode, thank you for pointing it out. They almost certainly aren't set correctly. I'm assuming this is the Intel Architecture manual you're referring to? Could you be more specific on where I should be looking? — nickelpro, Sep 01 '22 at 13:44
I quoted instruction set reference, _Table 2.4.4 Exceptions Type 4 (>=16 Byte Mem Arg, No Alignment, No Floating-point Exceptions)_ For initialization consult the system programming guide, _13.1.3 Initialization of the SSE Extensions_ — Jester, Sep 01 '22 at 13:49
And that's the answer, could you create a full answer to this so I can credit you or would you like me to self-answer? — nickelpro, Sep 01 '22 at 13:54
For kernel hacking, keep in mind that if you're going to allow user-space code to use SSE, then you need to first make sure that your context switch code supports it, by saving and restoring all the SSE data and control registers. And if your kernel is going to use SSE itself (i.e. if it will be compiled without `-mno-sse` or equivalent) then your kernel entry and exit code needs to do the same. The CR0 "kill bit" exists so that OSes without such support don't inadvertently create data leaks between user space processes. You have to "opt in" to SSE once your kernel support is in place. — Nate Eldredge, Sep 01 '22 at 14:57
Linux for instance is in fact compiled with `-mno-sse`, or actually with the stronger `-mgeneral-regs-only`, precisely so that it can avoid the overhead of save/restore at kernel entry/exit. Kernel code that really does want to use x87/SSE/AVX, like crypto code, requires an explicit save/restore around it. — Nate Eldredge, Sep 01 '22 at 15:02
Thanks Nate! I was aware our context switch code didn't support SSE usage and we would need to compile with `-mno-sse` no matter what, but didn't understand the crash (as with all answers that are clearly explained in the Intel Manual, its difficult to google for). I did not know about `-mgeneral-regs-only` so thank you for that too. I'll be switching to that flag in our build script. — nickelpro, Sep 01 '22 at 16:17

Qemu invalid instruction trap on SSE instruction

0 Answers0