8

(This question was originally about the CVTSI2SD instruction and the fact that I thought it didn't work on the Pentium M CPU, but in fact it's because I'm using a custom OS and I need to manually enable SSE.)

I have a Pentium M CPU and a custom OS which so far used no SSE instructions, but I now need to use them.

Trying to execute any SSE instruction results in an interruption 6, illegal opcode (which in Linux would cause a SIGILL, but this isn't Linux), also referred to in the Intel architectures software developer's manual (which I refer from now on as IASDM) as #UD - Invalid Opcode (UnDefined Opcode).

Edit: Peter Cordes actually identified the right cause, and pointed me to the solution, which I resume below:

If you're running an ancient OS that doesn't support saving XMM regs on context switches, the SSE-enabling bit in one of the machine control registers won't be set.

Indeed, the IASDM mentions this:

If an operating system did not provide adequate system level support for SSE, executing an SSE or SSE2 instructions can also generate #UD.

Peter Cordes pointed me to the SSE OSDev wiki, which describes how to enable SSE by writing to both CR0 and CR4 control registers:

clear the CR0.EM bit (bit 2) [ CR0 &= ~(1 << 2) ]
set the CR0.MP bit (bit 1) [ CR0 |= (1 << 1) ]
set the CR4.OSFXSR bit (bit 9) [ CR4 |= (1 << 9) ]
set the CR4.OSXMMEXCPT bit (bit 10) [ CR4 |= (1 << 10) ]

Note that, in order to be able to write to these registers, if you are in protected mode, then you need to be in privilege level 0. The answer to this question explains how to test it: if in protected mode, that is, when bit 0 (PE) in CR0 is set to 1, then you can test bits 0 and 1 from the CS selector, which should be both 0.

Finally, the custom OS must properly handle XMM registers during context switches, by saving and restoring them when necessary.

Community
  • 1
  • 1
anol
  • 8,264
  • 3
  • 34
  • 78
  • `CVTSI2SD—Convert Dword Integer to Scalar Double-Precision FP Value ` belongs to the SSE2 instruction set, and this is confirmed in the Intel Software Developer Manuals. – Iwillnotexist Idonotexist Jul 22 '15 at 12:28
  • 3
    "Yet, the Pentium M does not recognize CVTSI2SD" source? – harold Jul 22 '15 at 12:29
  • I have a program which uses it and it crashes on a real Pentium M. Also, its Intel user manual (of which I have a paper copy) does not include that instruction. – anol Jul 22 '15 at 12:33
  • 1
    What is the cause of the crash - `SIGILL` ("illegal instruction") or something else ? – Paul R Jul 22 '15 at 12:34
  • 1
    Can you please run the application under GDB, and give us the error and the output of `(gdb) disas /r` at the crash site? – Iwillnotexist Idonotexist Jul 22 '15 at 12:37
  • 1
    Are you sure you don't actually have a [**Pentium III-M**](https://en.wikipedia.org/wiki/List_of_Intel_Pentium_III_microprocessors#.22Katmai.22_.28250_nm.29)? – Iwillnotexist Idonotexist Jul 22 '15 at 12:44
  • I got an interruption 6, which from the Intel user manual means "invalid opcode (undefined opcode)". – anol Jul 22 '15 at 12:45
  • 1
    Can you post the value of `eax` after executing `mov eax, 1 / cpuid`? –  Jul 22 '15 at 12:45
  • cpuid returns 0xA7E9FBBF, that is `0010 0111 1110 1001 1111 1011 1011 1111` in binary. – anol Jul 22 '15 at 13:01
  • Would it be possible that SSE instructions could be disabled/forbidden during runtime? I found no references to that, but I get interruption 6 when I run something newer than MMX instructions. – anol Jul 22 '15 at 13:03
  • @IwillnotexistIdonotexist unfortunately my setup is a bit complex, I compile on one machine and run it via a custom kernel on another, so I cannot easily run GDB (although it should be possible), but I'm trying to slowly obtain information about it. – anol Jul 22 '15 at 13:04
  • 1
    @anol: Ahhh, that's probably it. If you're running an ancient OS that doesn't support saving XMM regs on context switches, the SSE-enabling bit in one of the machine control registers won't be set. In that case all instructions that touch xmm regs will fault with undefined instruction. – Peter Cordes Jul 22 '15 at 13:11
  • Wow, it that possible? How can I obtain more information about that? I tried searching for it but every website mentioned people who actually wanted their compiler to avoid emitting SSE code, not hardware deactivation of SSE. So I thought it was not possible. – anol Jul 22 '15 at 13:13
  • 1
    I updated my answer with a link. Yeah, it's a thing. It got more discussion in really old docs from when SSE was brand new. Introducing new architectural state that must be saved on context switches was a Big Deal. Presumably there are similar bits for 256b ymm regs, because an OS that only saves/restores the low 128 would be a big problem. – Peter Cordes Jul 22 '15 at 13:23
  • 1
    @anol it's reversed actually, you don't disable SSE in hardware, you *enable* it (or not, as happened here) – harold Jul 22 '15 at 13:56

2 Answers2

8

If you're running an ancient or custom OS that doesn't support saving XMM regs on context switches, it won't have set the SSE-enabling bits in the machine control registers. In that case all instructions that touch xmm regs will fault.

Took me a sec to find, but http://wiki.osdev.org/SSE explains how to alter CR0 and CR4 to allow SSE instructions to run on bare metal without #UD.


My first thought on your old version of the question was that you might have compiled your program with -mavx, -march=sandybridge or equivalent, causing the compiler to emit the VEX-encoded version of everything.

CVTSI2SD   xmm1, xmm2/m32         ; SSE2
VCVTSI2SD  xmm1, xmm2, xmm3/m32   ; AVX

See https://stackoverflow.com/tags/x86/info for links, including to Intel's insn set ref manual.

Most real-world kernels are built with options that stop the compiler from using SSE or x87 instructions on its own, for example gcc -mgeneral-regs-only. Or in older GCC, -mno-sse -mno-mmx and avoid any use of float or double types to avoid x87. This is so kernels only have to save/restore integer registers on interrupts and system calls, only doing the SIMD/FP state on a full context switch to a different user-space task. Before that option existed and was used, Linux kernel code that used double could silently corrupt user-space state!

If you have a freestanding program that isn't trying to context-switch between user-space tasks, go ahead and let the compiler use SSE / AVX.


Related: Which versions of Windows support/require which CPU multimedia extensions? (How to check if SSE or AVX are fully usable?) has some details about how to check for support for AVX and AVX512 (which also introduce new architectural state, so the OS has to set a bit or the HW will fault). It's coming at it from the other angle, but the links should indicate how to activate / disable AVX support.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • I'm using an old GCC and it does not seem to have that architecture. And I looked at the assembly and also tried inserting the instruction directly via `asm()`, so there's little chance of that being the case. – anol Jul 22 '15 at 12:36
  • 1
    It's -mavx, oops. Look at the disassembly. If it's `vcvt...`, and your program dies with SIGILL, then it's an AVX problem. Otherwise, you're probably getting a SIGSEGV, not SIGILL. If it's SIGILL, then there's something weird going on, and you should run it under gdb, so it stops at the exact instruction that faulted. – Peter Cordes Jul 22 '15 at 12:41
  • I tried compiling in some assembly code containing specifically `CVTSI2SD`, to ensure it is the culprit, and I got interruption 6. But it also happened with `CVTSI2SS`, so it's actually an SSE-related issue. Instructions containing references to `%xmmN` registers do not work. – anol Jul 22 '15 at 13:06
  • There's an answer on http://stackoverflow.com/questions/6121792/how-to-check-if-a-cpu-supports-the-sse3-instruction-set about detecting OS support. But IIRC, Intel's suggested way to detect OS support is to try running an SSE instruction and see if it faults with #UD. Yeah, pretty terrible, esp. for a library or something, because that forces the calling program to handle SIGILL! IIRC, the machine control registers that the OS has to set aren't even readable by unprivileged code. – Peter Cordes Jul 22 '15 at 13:25
  • I'm trying to enable it as described in the link, but so far no good. The assembly code seems to write the right values to CR0/CR4, but trying something as `movss` afterwards still fails. – anol Jul 22 '15 at 14:12
  • I don't know any more than that, sorry. Better look at the full Intel manuals, https://www-ssl.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html. You are writing to CR0/CR4 in priv level 0, right? `mov cr*` faults (with #GP(0)) if priv level isn't 0, according to Intel's insn set ref manual. IDK if there are any other requirements, like running in protected mode, for using SSE. Or maybe edit this question into: "How do I enable SSE for my freestanding bootable code?" – Peter Cordes Jul 22 '15 at 14:21
  • @IwillnotexistIdonotexist: He never said his custom kernel was based on Linux. He took my suggestion for a question title, which really doesn't suggest Linux. Anol: you should prob. not call it SIGILL if you aren't actually running a Unix kernel that uses those signals. The way you were talking about interrupts, with different code numbers from Linux signals, was my clue you weren't talking about Linux user-space at all. – Peter Cordes Jul 22 '15 at 17:23
  • 1
    @anol: I made an edit like what I was suggesting. Hopefully that will attract some expert help from OS types. You might want to mention what state you have the CPU in when you try to use the `mov cr` instructions. (real mode? 32bit protected mode?) – Peter Cordes Jul 22 '15 at 17:29
  • @PeterCordes That clears things up. My clues that this _was_ Linux were precisely SIGILL and talk of a custom kernel (which I interpreted as Linux with a personalized kernel). – Iwillnotexist Idonotexist Jul 22 '15 at 17:53
  • @PeterCordes Could you please complement your solution with the details I added up? Just for completion's sake. Also, maybe move the CVTSS2SD part down, since it is not relevant for the actual problem I had but could be useful for other people. – anol Jul 24 '15 at 06:19
2

I suggest that you consult Intel's manual when you have such questions.

It's clearly stated in the manual that CVTSI2SD is an SSE2 instruction.

Michael
  • 57,169
  • 9
  • 80
  • 125
  • What does CPUID report for supported instructions ? `less /proc/cpuinfo | grep flags` if you're on Linux. – Paul R Jul 22 '15 at 12:41
  • 1
    @anol Within the manual, the section for you to read is _Volume 3, Section 13.1 PROVIDING OPERATING SYSTEM SUPPORT FOR SSE EXTENSIONS_. This section has a very clear description and list of what must be done for SSE support. Michael, you should definitely add a reference to this section in your answer. – Iwillnotexist Idonotexist Jul 22 '15 at 18:06